
Recent Advances in the Message Passing Interface
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title
- Preface
- Organization
- Table of Contents
- Experience of a PRACE Center
- Achieving Exascale Computing through Hardware/Software Co-design
- References
- Will MPI Remain Relevant?
- Exascale Algorithms for Generalized MPI Comm split
- Introduction
- Groups as Chains
- Algorithms
- Serial Sort Algorithms
- Parallel Sort Algorithms
- Hash-Based Algorithm
- Results
- Conclusions
- References
- Order Preserving Event Aggregation in TBONs
- Introduction
- TBON Aggregation
- Order Preserving Aggregation
- Scalability
- Performance Results
- Related Work
- Conclusions
- References
- Using MPI Derived Datatypes in Numerical Libraries
- Introduction
- Distributed Matrix Operations
- Gathering/Scattering Column-Wise Matrices
- Matrix Transpose
- Elemental Cyclically Distributed Matrices
- Experiments
- Column Matrices
- Matrix Transposition
- Cyclically Distributed Matrices
- Concluding Remarks
- References
- Improving MPI Applications Performance on Multicore Clusters with Rank Reordering
- Introduction
- Matching a Communication Pattern to the Hardware Architecture: Issues and Techniques
- General Overview of the Problem
- Core Binding vs. Rank Reordering
- A Non-trivial Implementation of MPI_Dist_graph_create
- Gathering the Hardware Information
- Communication Pattern Information and Metrics
- The TreeMatch Matching Algorithm
- Performance Improvements Evaluation
- The Ring Pattern Benchmark
- ZEUS-MP
- RSA-768 - The Block Wiedemann Algorithm
- Related Works
- Conclusion and Future Works
- References
- Multi-core and Network Aware MPI Topology Functions
- Introduction
- Background and Motivation
- Related Work
- Design and Implementation of MPI Topology Functions
- Design of the Graph Topology Function
- Implementation of Topology Functions
- Experimental Results
- Micro-benchmark Results
- Evaluation Results for MPI Applications
- Implementation Overhead
- Conclusions and Future Work
- References
- Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers
- Introduction
- Related Work
- Our Approach
- First Attempt - MDF
- Increasing Bisection Bandwidth
- Towards Optimal Placement
- Some Issues
- Placing Real Applications
- Summary of Results and Future Directions
- References
- Improving the Average Response Time in Collective I/O
- Introduction
- Design and Implementation
- Performance Evaluation
- Related Work
- Conclusion
- References
- OMPIO: A Modular Software Architecture for MPI I/O
- Introduction and Motivation
- Related Work
- The OMPIO Set of Frameworks
- The file system Framework (fs)
- The file byte-transfer layer Framework (fbtl)
- The collective I/O Framework (fcoll)
- The file cache Framework (fcache)
- The shared file pointer Framework (sharedfp)
- Experimental Results
- Conclusion
- References
- Design and Evaluation of Nonblocking Collective I/O Operations
- Introduction
- Challenges of Nonblocking Collective I/O Operations
- Collective I/O Algorithm
- A Framework for Nonblocking Collective I/O Operations
- Schedule Caching
- Performance Evaluation
- Experimental Setting
- An Application Scenario
- Discussion
- Conclusion
- References
- Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows
- Introduction
- Motivation
- Contributions
- Background and Related Work
- MPI One-Sided Communication Model
- Existing Implementation for Intra-node MPI One-sided Communication
- Design and Implementation
- Window Creation
- Communication
- Synchronization
- Experimental Results
- Micro-Benchmark Evaluation
- Application Benchmark Evaluation
- Conclusion
- References
- A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE
- Introduction
- Generic Network Interface
- Elements of the API
- Remote Direct Memory Access Transactions and Messaging
- MPICH2 Nemesis
- The uGNI Netmod
- Initialization and Connection Setup
- Eager Message Path
- Rendezvous Message Path
- uDREG Library and Memory Registration
- Network Fault Tolerance
- Basic Performance Characteristics
- Message Rate and Latency
- Bandwidth
- Future Work
- References
- Using Triggered Operations to Offload Rendezvous Messages
- Introduction
- Related Work
- Triggered Operations in Portals 4
- Evaluation Methodology
- Eager Protocol
- Host-Based Rendezvous Protocol
- Triggered Rendezvous Protocol
- Results
- Conclusions
- References
- pupyMPI - MPI Implemented in Pure Python
- Introduction
- MPI with Python
- Related Work
- Overview of pupyMPI
- Concurrency in Python
- Supporting Numpy
- Caching Socket Connections
- pupyMPI API
- General Operations
- Point-to-Point Operations
- Collective Operations
- Other Operations on Communicators
- A Working Example
- Collective Operations
- Topology Reordering
- Algorithm Selection
- Non-blocking Collective Operations
- pupyMPI User's Toolset
- Benchmarks
- Conclusion and Future Work
- References
- Scalable Memory Use in MPI: A Case Study with MPICH2
- Introduction
- Apparent Nonscalable Memory Use in MPI
- Memory Usage in MPICH2
- Link-Time Program Text Size Savings
- One-Sided Communication
- MPI Groups
- Virtual Connections
- Communicator and Topology Information
- Steps to Reduce MPICH2 Memory Consumption
- Implemented Solutions
- Proposed Solutions
- Results
- Scalable Memory Use
- Performance Impact
- Application Impact
- Conclusions
- References
- Performance Expectations and Guidelines for MPI Derived Datatypes
- Introduction
- Related Work
- Derived Datatype Constructors
- Trivial Expectations
- Non-trivial Guidelines
- Packing
- Datatype Preprocessing and Commit
- Initial Experimental Results
- Conclusion
- References
- The Analysis of Cluster Interconnect with the Network Tests2 Toolkit
- Introduction
- The Components Description
- Description of the Method for Cluster Interconnect Testing
- Modes of Communications Testing Provided by network_test
- Description of Testing Results Visualisation System
- Description of Testing Results Clustering Method
- Results and Conclusion
- References
- Parallel Sorting with Minimal Data
- Introduction
- Communicator Construction
- Related Work
- Algorithm Designs
- Sequential Algorithm
- Counting Algorithm
- Ring Algorithm with O(1) Memory
- Scalable Algorithm
- Experimental Evaluation
- Conclusion
- References
- Scaling Performance Tool MPI Communicator Management
- Introduction
- Related Work
- Original Scalasca Scheme
- Communicator Management During Trace Collection
- Distributed Communicator Tracking
- Unification of Definition Identifiers
- Representation of Communicators
- Rank Translation
- Evaluation
- Conclusion
- References
- Per-call Energy Saving Strategies in All-to-All Communications
- Introduction
- Effect of CPU Throttling on Communication
- All-to-All Energy Aware Algorithm
- Power Consumption Estimates
- Experimental Results
- Related Work
- Conclusions and Future Work
- References
- Data Redistribution Using One-sided Transfers to In-Memory HDF5 Files
- Introduction
- HDF5 File IOs
- DSM Driver and Communicators
- MPI RMA Inter-communicator.
- DMAPP Inter-communicator.
- Redistribution Strategies
- Mask Redistribution
- Block Cyclic Redistribution
- Random Block Redistribution
- Performance Evaluation
- Internode Micro-Benchmark
- Single Dataset Benchmark
- Contiguous/Linear Distribution.
- Block Cyclic and Random Block Redistributions.
- Multiple Dataset Benchmark
- Related Work and Discussion
- Conclusion
- References
- RCKMPI - Lightweight MPI Implementation for Intel's Single-chip Cloud Computer (SCC)
- Introduction
- SCC Hardware Architecture
- General Overview
- SCC Memory Architecture
- Hardware Support for Message Passing
- RCKMPI Architecture
- RCKMPI Channels Design
- Shared Memory Communication
- Multiple Buffer Type Operation
- Performance Results
- Concluding Remarks
- References
- Hybrid OpenMP-MPI Turbulent Boundary Layer Code Over 32k Cores
- Introduction
- The Numerical Code
- Computational Setup
- Domain Decomposition
- Global Transposes and Collective Communications
- Blue Gene/P Mapping
- Scalability Results in Blue Gene/P
- OpenMP Scalability
- MPI Scalability
- Parallel I/O
- Conclusions
- References
- CAF versus MPI - Applicability of Coarray Fortran to a Flow Solver
- Introduction
- Numerical Method
- Alignment in Memory
- Traditional Parallelization Approach
- Strategy Following the Coarray Concept
- Tested Communication Schemes
- Data Structures
- Buffered and Direct Communication
- Experimental Results
- Influence of the Memory Layout in Serial and Parallel
- Derived Type and Regular Coarray Buffers
- MPI Compared to Coarray Communications
- Conclusion
- References
- The Impact of Injection Bandwidth Performance on Application Scalability
- Introduction
- Test Platform
- Approach
- Benchmarks and Applications
- Results
- Related Work
- Conclusions and Future Work
- References
- Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW
- Introduction
- Related Work
- Applications
- Experiments
- Experimental Conditions
- CPMD
- FFTW
- Conclusion
- References
- A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI
- Introduction
- Related Work
- Two-Phase Commit Algorithm
- Log Scaling Two-Phase Commit Algorithm
- Results
- Conclusion
- References
- Fault Tolerance in an Industrial Seismic Processing Application for Multicore Clusters
- Introduction
- Fault Tolerance
- The Fault Monitoring Mechanism
- Failure Recovery
- Experimental Results
- Conclusion and Future Work
- References
- libhashckpt: Hash-Based Incremental Checkpointing Using GPU's
- Introduction
- Approach
- Overview
- Library Implementation Details
- Applications and Platform
- Results
- Hash-Based Dirty Data Detection
- Checkpoint File Size Comparison
- GPU Performance
- Viability of Hash-Based Incremental Checkpointing
- Related Works
- Conclusions and Future Work
- References
- Noncollective Communicator Creation in MPI
- Introduction
- Need for Noncollective Communicator Creation
- Fault Tolerance
- Global Arrays
- Dynamic Load Balancing and Multilevel Parallelism
- Noncollective Formation of MPI Communicators
- Experimental Evaluation
- Group Creation Cost
- MCMC Load-Balancing Example
- Discussion
- Group-Collective Communicator Creation
- Generalized Multicommunicators
- Conclusion
- References
- Evaluation of Interpreted Languages with Open MPI
- Introduction
- MPI Language Bindings
- Related Work
- Performance Evaluation
- Conclusions
- References
- Leveraging C++ Meta-programming Capabilities to Simplify the Message Passing Programming Model
- Introduction
- Overview
- The mem_wrap Object
- Jacobi Relaxation
- Conclusions and Future Work
- References
- Portable and Scalable MPI Shared File Pointers
- References
- Improvement of the Bandwidth of Cross-Site MPI Communication Using Optical Fiber
- Introduction and Related Work
- Modified MPI Point-to-Point Algorithm
- References
- Performance Tuning of SCC-MPICH by Means of the Proposed MPI-3.0 Tool Interface
- Introduction and Overview
- A Customized MPI Library for the Intel SCC
- A Prototype of the MPI-3.0 Tool Information Interface
- References
- Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand
- Introduction and Overview
- Design and Evaluation
- Conclusion
- References
- Scalable Distributed Consensus to Support MPI Fault Tolerance
- Introduction
- Algorithm
- Performance Evaluation
- Conclusion
- References
- Run-Through Stabilization: An MPI Proposal for Process Fault Tolerance
- Introduction
- Process Fault Tolerance Model
- Validation of Process State
- Semantic Modifications
- References
- Integrating MPI with Asynchronous Task Parallelism
- Introduction
- HC-MPI
- Experimental Results
- Related Work
- Conclusions and Future Work
- References
- Performance Evaluation of Thread-Based MPI in Shared Memory
- System Design and Performance
- Reference
- MPI-DB, A Parallel Database Services Software Library for Scientific Computing
- References
- Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure
- Introduction and Motivation
- Evaluation
- Conclusion
- References
- Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions
- Introduction
- Modular Distributed Memory Programming
- A Taxonomy for Parallel Libraries
- Example of Libraries
- Common Requirements of Parallel Libraries
- The Loosely Synchronous Model in MPI
- Where It Breaks
- Reentrant Libraries
- Nonblocking Libraries
- Complex Communication Operations
- Process Synchronization Outside of MPI
- Hybrid Programming
- Thread-Safe Message Probing
- Control Transfer and Threading
- Communication Endpoints
- Guidelines for Library Designers
- What to Avoid!
- Progress
- Summary and Conclusions
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.