Recent Advances in the Message Passing Interface

Name: Recent Advances in the Message Passing Interface | 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings

Yiannis Cotronis Anthony Danalis Dimitris Nikolopoulos Jack Dongarra(Editor)

Springer (Publisher)

Published on 15. September 2011

XIV, 358 pages

E-Book

PDF with digital watermarking

System requirements

978-3-642-24449-0 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Title
Preface
Organization
Table of Contents
Experience of a PRACE Center
Achieving Exascale Computing through Hardware/Software Co-design
References
Will MPI Remain Relevant?
Exascale Algorithms for Generalized MPI Comm split
Introduction
Groups as Chains
Algorithms
Serial Sort Algorithms
Parallel Sort Algorithms
Hash-Based Algorithm
Results
Conclusions
References
Order Preserving Event Aggregation in TBONs
Introduction
TBON Aggregation
Order Preserving Aggregation
Scalability
Performance Results
Related Work
Conclusions
References
Using MPI Derived Datatypes in Numerical Libraries
Introduction
Distributed Matrix Operations
Gathering/Scattering Column-Wise Matrices
Matrix Transpose
Elemental Cyclically Distributed Matrices
Experiments
Column Matrices
Matrix Transposition
Cyclically Distributed Matrices
Concluding Remarks
References
Improving MPI Applications Performance on Multicore Clusters with Rank Reordering
Introduction
Matching a Communication Pattern to the Hardware Architecture: Issues and Techniques
General Overview of the Problem
Core Binding vs. Rank Reordering
A Non-trivial Implementation of MPI_Dist_graph_create
Gathering the Hardware Information
Communication Pattern Information and Metrics
The TreeMatch Matching Algorithm
Performance Improvements Evaluation
The Ring Pattern Benchmark
ZEUS-MP
RSA-768 - The Block Wiedemann Algorithm
Related Works
Conclusion and Future Works
References
Multi-core and Network Aware MPI Topology Functions
Introduction
Background and Motivation
Related Work
Design and Implementation of MPI Topology Functions
Design of the Graph Topology Function
Implementation of Topology Functions
Experimental Results
Micro-benchmark Results
Evaluation Results for MPI Applications
Implementation Overhead
Conclusions and Future Work
References
Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers
Introduction
Related Work
Our Approach
First Attempt - MDF
Increasing Bisection Bandwidth
Towards Optimal Placement
Some Issues
Placing Real Applications
Summary of Results and Future Directions
References
Improving the Average Response Time in Collective I/O
Introduction
Design and Implementation
Performance Evaluation
Related Work
Conclusion
References
OMPIO: A Modular Software Architecture for MPI I/O
Introduction and Motivation
Related Work
The OMPIO Set of Frameworks
The file system Framework (fs)
The file byte-transfer layer Framework (fbtl)
The collective I/O Framework (fcoll)
The file cache Framework (fcache)
The shared file pointer Framework (sharedfp)
Experimental Results
Conclusion
References
Design and Evaluation of Nonblocking Collective I/O Operations
Introduction
Challenges of Nonblocking Collective I/O Operations
Collective I/O Algorithm
A Framework for Nonblocking Collective I/O Operations
Schedule Caching
Performance Evaluation
Experimental Setting
An Application Scenario
Discussion
Conclusion
References
Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows
Introduction
Motivation
Contributions
Background and Related Work
MPI One-Sided Communication Model
Existing Implementation for Intra-node MPI One-sided Communication
Design and Implementation
Window Creation
Communication
Synchronization
Experimental Results
Micro-Benchmark Evaluation
Application Benchmark Evaluation
Conclusion
References
A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE
Introduction
Generic Network Interface
Elements of the API
Remote Direct Memory Access Transactions and Messaging
MPICH2 Nemesis
The uGNI Netmod
Initialization and Connection Setup
Eager Message Path
Rendezvous Message Path
uDREG Library and Memory Registration
Network Fault Tolerance
Basic Performance Characteristics
Message Rate and Latency
Bandwidth
Future Work
References
Using Triggered Operations to Offload Rendezvous Messages
Introduction
Related Work
Triggered Operations in Portals 4
Evaluation Methodology
Eager Protocol
Host-Based Rendezvous Protocol
Triggered Rendezvous Protocol
Results
Conclusions
References
pupyMPI - MPI Implemented in Pure Python
Introduction
MPI with Python
Related Work
Overview of pupyMPI
Concurrency in Python
Supporting Numpy
Caching Socket Connections
pupyMPI API
General Operations
Point-to-Point Operations
Collective Operations
Other Operations on Communicators
A Working Example
Collective Operations
Topology Reordering
Algorithm Selection
Non-blocking Collective Operations
pupyMPI User's Toolset
Benchmarks
Conclusion and Future Work
References
Scalable Memory Use in MPI: A Case Study with MPICH2
Introduction
Apparent Nonscalable Memory Use in MPI
Memory Usage in MPICH2
Link-Time Program Text Size Savings
One-Sided Communication
MPI Groups
Virtual Connections
Communicator and Topology Information
Steps to Reduce MPICH2 Memory Consumption
Implemented Solutions
Proposed Solutions
Results
Scalable Memory Use
Performance Impact
Application Impact
Conclusions
References
Performance Expectations and Guidelines for MPI Derived Datatypes
Introduction
Related Work
Derived Datatype Constructors
Trivial Expectations
Non-trivial Guidelines
Packing
Datatype Preprocessing and Commit
Initial Experimental Results
Conclusion
References
The Analysis of Cluster Interconnect with the Network Tests2 Toolkit
Introduction
The Components Description
Description of the Method for Cluster Interconnect Testing
Modes of Communications Testing Provided by network_test
Description of Testing Results Visualisation System
Description of Testing Results Clustering Method
Results and Conclusion
References
Parallel Sorting with Minimal Data
Introduction
Communicator Construction
Related Work
Algorithm Designs
Sequential Algorithm
Counting Algorithm
Ring Algorithm with O(1) Memory
Scalable Algorithm
Experimental Evaluation
Conclusion
References
Scaling Performance Tool MPI Communicator Management
Introduction
Related Work
Original Scalasca Scheme
Communicator Management During Trace Collection
Distributed Communicator Tracking
Unification of Definition Identifiers
Representation of Communicators
Rank Translation
Evaluation
Conclusion
References
Per-call Energy Saving Strategies in All-to-All Communications
Introduction
Effect of CPU Throttling on Communication
All-to-All Energy Aware Algorithm
Power Consumption Estimates
Experimental Results
Related Work
Conclusions and Future Work
References
Data Redistribution Using One-sided Transfers to In-Memory HDF5 Files
Introduction
HDF5 File IOs
DSM Driver and Communicators
MPI RMA Inter-communicator.
DMAPP Inter-communicator.
Redistribution Strategies
Mask Redistribution
Block Cyclic Redistribution
Random Block Redistribution
Performance Evaluation
Internode Micro-Benchmark
Single Dataset Benchmark
Contiguous/Linear Distribution.
Block Cyclic and Random Block Redistributions.
Multiple Dataset Benchmark
Related Work and Discussion
Conclusion
References
RCKMPI - Lightweight MPI Implementation for Intel's Single-chip Cloud Computer (SCC)
Introduction
SCC Hardware Architecture
General Overview
SCC Memory Architecture
Hardware Support for Message Passing
RCKMPI Architecture
RCKMPI Channels Design
Shared Memory Communication
Multiple Buffer Type Operation
Performance Results
Concluding Remarks
References
Hybrid OpenMP-MPI Turbulent Boundary Layer Code Over 32k Cores
Introduction
The Numerical Code
Computational Setup
Domain Decomposition
Global Transposes and Collective Communications
Blue Gene/P Mapping
Scalability Results in Blue Gene/P
OpenMP Scalability
MPI Scalability
Parallel I/O
Conclusions
References
CAF versus MPI - Applicability of Coarray Fortran to a Flow Solver
Introduction
Numerical Method
Alignment in Memory
Traditional Parallelization Approach
Strategy Following the Coarray Concept
Tested Communication Schemes
Data Structures
Buffered and Direct Communication
Experimental Results
Influence of the Memory Layout in Serial and Parallel
Derived Type and Regular Coarray Buffers
MPI Compared to Coarray Communications
Conclusion
References
The Impact of Injection Bandwidth Performance on Application Scalability
Introduction
Test Platform
Approach
Benchmarks and Applications
Results
Related Work
Conclusions and Future Work
References
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW
Introduction
Related Work
Applications
Experiments
Experimental Conditions
CPMD
FFTW
Conclusion
References
A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI
Introduction
Related Work
Two-Phase Commit Algorithm
Log Scaling Two-Phase Commit Algorithm
Results
Conclusion
References
Fault Tolerance in an Industrial Seismic Processing Application for Multicore Clusters
Introduction
Fault Tolerance
The Fault Monitoring Mechanism
Failure Recovery
Experimental Results
Conclusion and Future Work
References
libhashckpt: Hash-Based Incremental Checkpointing Using GPU's
Introduction
Approach
Overview
Library Implementation Details
Applications and Platform
Results
Hash-Based Dirty Data Detection
Checkpoint File Size Comparison
GPU Performance
Viability of Hash-Based Incremental Checkpointing
Related Works
Conclusions and Future Work
References
Noncollective Communicator Creation in MPI
Introduction
Need for Noncollective Communicator Creation
Fault Tolerance
Global Arrays
Dynamic Load Balancing and Multilevel Parallelism
Noncollective Formation of MPI Communicators
Experimental Evaluation
Group Creation Cost
MCMC Load-Balancing Example
Discussion
Group-Collective Communicator Creation
Generalized Multicommunicators
Conclusion
References
Evaluation of Interpreted Languages with Open MPI
Introduction
MPI Language Bindings
Related Work
Performance Evaluation
Conclusions
References
Leveraging C++ Meta-programming Capabilities to Simplify the Message Passing Programming Model
Introduction
Overview
The mem_wrap Object
Jacobi Relaxation
Conclusions and Future Work
References
Portable and Scalable MPI Shared File Pointers
References
Improvement of the Bandwidth of Cross-Site MPI Communication Using Optical Fiber
Introduction and Related Work
Modified MPI Point-to-Point Algorithm
References
Performance Tuning of SCC-MPICH by Means of the Proposed MPI-3.0 Tool Interface
Introduction and Overview
A Customized MPI Library for the Intel SCC
A Prototype of the MPI-3.0 Tool Information Interface
References
Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand
Introduction and Overview
Design and Evaluation
Conclusion
References
Scalable Distributed Consensus to Support MPI Fault Tolerance
Introduction
Algorithm
Performance Evaluation
Conclusion
References
Run-Through Stabilization: An MPI Proposal for Process Fault Tolerance
Introduction
Process Fault Tolerance Model
Validation of Process State
Semantic Modifications
References
Integrating MPI with Asynchronous Task Parallelism
Introduction
HC-MPI
Experimental Results
Related Work
Conclusions and Future Work
References
Performance Evaluation of Thread-Based MPI in Shared Memory
System Design and Performance
Reference
MPI-DB, A Parallel Database Services Software Library for Scientific Computing
References
Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure
Introduction and Motivation
Evaluation
Conclusion
References
Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions
Introduction
Modular Distributed Memory Programming
A Taxonomy for Parallel Libraries
Example of Libraries
Common Requirements of Parallel Libraries
The Loosely Synchronous Model in MPI
Where It Breaks
Reentrant Libraries
Nonblocking Libraries
Complex Communication Operations
Process Synchronization Outside of MPI
Hybrid Programming
Thread-Safe Message Probing
Control Transfer and Threading
Communication Endpoints
Guidelines for Library Designers
What to Avoid!
Progress
Summary and Conclusions
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Recent Advances in the Message Passing Interface

Description

More details

Other editions

Additional editions

Content

System requirements