Software for Exascale Computing - SPPEXA 2013-2015

Name: Software for Exascale Computing - SPPEXA 2013-2015
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

Hans-Joachim Bungartz Philipp Neumann Wolfgang E. Nagel(Editor)

Springer (Publisher)

Published on 14. September 2016

X, 565 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-40528-5 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Intro
Preface
Contents
Part I EXA-DUNE: Flexible PDE Solvers, Numerical Methods, and Applications
Hardware-Based Efficiency Advances in the Exa-Dune Project
1 The Exa-Dune Project
2 Hybrid Parallelism in DUNE
2.1 UMA Concept
3 Assembly
3.1 Thread Parallel Assembly
3.2 Higher Order DG Methods
3.3 Low Order Lagrange Methods
4 Linear Algebra
4.1 Efficient Matrix Format for Higher Order DG
4.2 GPU Accelerated Preconditioners and Strong Smoothers
5 Outlook
References
Advances Concerning Multiscale Methods and Uncertainty Quantification in Exa-Dune
1 Introduction
2 Numerical Multiscale Methods: A Case of Generality
2.1 The Multiscale Finite Element Method for Multiscale Elliptic Equations
2.2 Implementation and Parallelization
2.3 Hybrid MPI/SMP Implementation
3 The Multi-level Monte-Carlo Method
3.1 Principle
3.2 Implementation
4 Numerical Experiments
5 Conclusion
References
Part II ExaStencils: Advanced Stencil-Code Engineering
Systems of Partial Differential Equations in ExaSlang
1 Introduction
2 Multigrid Methods
3 The ExaStencils Approach
4 The ExaStencils DSL ExaSlang
4.1 Multi-layered Approach
4.2 Overview of ExaSlang 4
4.2.1 Stencils
4.2.2 Fields and Layouts
4.2.3 Data Types, Variables, and Values
4.2.4 Control Flow
4.2.5 Level Specifications
5 Code Generation
6 Data Types for Systems of Partial Differential Equations
6.1 Motivation
6.2 The ExaSlang Data Types
7 Modifications to the Code Generator
8 Example Application
8.1 Theoretical Background
8.2 Mapping to ExaSlang 4
8.3 Results
9 Related Work
10 Future Work
11 Conclusions
References
Performance Prediction of Multigrid-Solver Configurations
1 Introduction
2 Configurable Multigrid Solvers and the ExaStencils Code Generator
3 Performance Prediction
3.1 Sampling
3.1.1 Binary Sampling Heuristics
3.1.2 Experimental Designs
3.2 Performance-Influence Models
3.3 Integration of Domain Knowledge
3.3.1 Shrinking the Configuration Space
3.3.2 Domain Knowledge on Interactions
3.3.3 Independent Sampling Strategies and Independent Models
3.3.4 Integration of Analytical Models
3.3.5 Models for Disjoint Parts of a System
4 Evaluation
4.1 Leveraging Domain Knowledge
4.1.1 Experimental Setup
4.1.2 Results and Discussion
4.2 Code Generator
4.2.1 Experimental Setup
4.2.2 Results and Discussion
4.3 Threats to Validity
5 Related Work
6 Conclusion and Future Work
References
Part III EXASTEEL: Bridging Scales for Multiphase Steels
One-Way and Fully-Coupled FE2 Methods for Heterogeneous Elasticity and Plasticity Problems: Parallel Scalability and an Application to Thermo-Elastoplasticity of Dual-Phase Steels
1 Introduction
2 Thermodynamic and Continuum Mechanical Framework
2.1 Incorporation of Thermo-mechanics
2.2 Implementation Using a Complex Step Derivative Approximation
3 Framework for Direct-Micro-Macro Computations
3.1 General Approach
3.2 Approaches for Multiphase-Steel Incorporating Thermo-mechanics
4 Numerical Examples for the One-Way FE2 Coupling
5 FE2TI: A Parallel Implementation of the Fully Coupled FE2 Approach
5.1 Implementation Remarks
5.2 Production Runs on the JUQUEEN Supercomputer
5.3 Strong Scalability on JUQUEEN
6 Conclusion
References
Scalability of Classical Algebraic Multigrid for Elasticity to Half a Million Parallel Tasks
1 Introduction
2 Algebraic Multigrid
3 Algebraic Multigrid for Systems of PDEs
4 The Global Matrix Approach
5 The Local Neighborhood Approach
6 Numerical Results
6.1 Results in Two Dimensions
6.2 Results in Three Dimensions
6.2.1 3D Beam Problem
6.2.2 3D Beam Problem with Double Length
6.2.3 3D Cuboid Problem
6.3 Parallel Problem Assembly and Reordering Process
7 Conclusions
References
Part IV EXAHD: An Exa-Scalable Two-Level Sparse Grid Approach for Higher-Dimensional Problems in Plasma Physics and Beyond
Recent Developments in the Theory and Application of the Sparse Grid Combination Technique
1 Introduction
2 A Class of Combination Techniques
3 Algorithms and Data Structures
4 Modified Combination Coefficients
5 Computing Eigenvalues and Eigenvectors
5.1 An Opticom Approach for Solving the Eigenvalue Problem
5.2 Iterative Refinement and Iterative Methods
6 Conclusions
References
Scalable Algorithms for the Solution of Higher-Dimensional PDEs
1 Introduction
1.1 Sparse Grid Combination Technique
1.2 Large Scale Plasma Turbulence Simulations with GENE
2 Software Framework for Large-Scale Computations with the Combination Technique
3 Scalable Algorithms for the Combination Step with Distributed Component Grids
3.1 Distributed Hierarchization/Dehierarchization
3.2 Local Reduction/Scatter of Component Grids Inside the Process Group
3.2.1 Variant 1: General Reduction of Distributed Component Grids
3.2.2 Variant 2: Communication-Free Local Reduction of Uniformly Parallelized Component Grids
3.3 Global Reduction of the Combination Solution
4 Results
5 Conclusion and Future Work
References
Handling Silent Data Corruption with the Sparse Grid Combination Technique
1 Introduction
1.1 Understanding Silent Data Corruption
1.2 Statement of the Problem
2 Basics of Sparse Grids
2.1 The Sparse Grid Combination Technique
3 The SGCT in Parallel and Fault Tolerance with the Combination Technique
3.1 SDC and the Combination Technique
3.2 Sanity Check 1: Filtering SDC via Comparison of Pairs of Solutions
3.3 Sanity Check 2: Filtering SDC via Outlier Detection
4 Numerical Tests
4.1 Experimental Setup
4.2 Results
5 Conclusions
References
Part V TERRA-NEO: Integrated Co-Design of an Exascale Earth Mantle Modeling Framework
Hybrid Parallel Multigrid Methods for Geodynamical Simulations
1 Introduction
2 Geodynamical Modeling
3 Discretization and Hybrid Parallel Multigrid Methods
3.1 Finite Element Discretization
3.2 Multigrid Solvers and the HHG Framework
4 Scalability and Performance of the Multigrid Method
4.1 Operator Counts
4.2 Scalability
4.3 Fault Tolerance
4.4 Performance
5 Application to the Earth's Upper Mantle
6 Simulations of the Coupled Problem
7 Conclusion
References
Part VI ExaFSA: Exascale Simulationof Fluid-Structure-Acoustics Interactions
Partitioned Fluid-Structure-Acoustics Interaction on Distributed Data: Coupling via preCICE
1 Introduction
2 Coupling Building Blocks on Distributed Data
2.1 Communication of Distributed Data
2.1.1 Surface Mesh Re-Partitioning
2.1.2 Point-to-Point Communication
2.2 Interpolation Methods on Distributed Data
2.2.1 Projection-Based Interpolation
2.2.2 Radial Basis Function (RBF) Interpolation
2.3 Fixed-Point Acceleration Methods on Distributed Data
2.3.1 Theory of Robust Quasi-Newton Fixed-Point Acceleration
2.3.2 Implementational Aspects of Quasi-Newton Coupling Iterations
3 Scalability Study
3.1 Testcase Description
3.2 Strong Scaling for n=5122 = 262,144
3.3 Strong Scaling for n=1282 = 16,384
3.4 Varying Problem Size n=16,...,128
4 Conclusions
References
Partitioned Fluid-Structure-Acoustics Interaction on Distributed Data: Numerical Results and Visualization
1 Introduction
2 Description of the Individual Solvers
2.1 Fluid Dynamics in the Acoustic Near Field
2.1.1 OpenFOAM: Compressible Flow Solver
2.1.2 FASTEST: Incompressible Flow Solver
2.2 Acoustic Wave Propagation
2.2.1 FASTEST: Acoustic Near Field
2.2.2 Ateles: Acoustic Far Field
2.3 Structural Dynamics
2.3.1 OpenFOAM: Finite Volume Structure Solver
2.3.2 FEAP: Finite Element Structure Solver
3 Coupling
3.1 Coupling the Elastic Structure with the Acoustic Fluid
3.2 Coupling the Acoustic Near Field with the Far Field
3.3 Coupling the Incompressible Flow with Acoustic Perturbations
4 Visualization
4.1 In-Situ Visualization
4.2 Simulation-Visualization Setup
4.3 Intermediate Representation: Volumetric Depth Images
4.4 Visualization Transform and Render
5 The Three-Dimensional Bending Tower Testcase
5.1 Testcase Description
5.2 Numerical Results
5.3 Scaling Results
5.4 Visualization
6 Conclusion and Outlook
References
Part VII ESSEX: Equipping Sparse Solvers for Exascale
Towards an Exascale Enabled Sparse Solver Repository
1 Introduction
2 ESSR Architecture and Development Process
2.1 Software Architecture
2.2 Concurrent Development of all Layers
2.3 Integration of Performance Engineering
2.4 Fault Tolerance Strategy
3 ESSR Software Landscape
3.1 Hardware and Execution Models Supported
3.2 ESSR Toolkits and Functionality
3.3 Applications
3.4 Kernel Interface
3.5 Computational Core
3.6 Verifying Software Correctness and Performance
4 Algorithms Implemented in the ESSR
4.1 Algorithms Based on Chebyshev Polynomials
4.2 Beyond FEAST: Projection Based Methods
4.3 Block Jacobi-Davidson QR
5 Fault Tolerance
6 Summary and Outlook
References
Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers
1 Introduction
2 Contribution
3 Holistic Performance Engineering Driving Energy Efficiency on the Example of the Kernel Polynomial Method (KPM)
3.1 Performance Engineering for KPM
3.1.1 Sparse Matrix Data Format
3.1.2 Kernel Fusion and Blocking
3.2 Single-Socket Performance and Energy Analysis
3.2.1 Multi-Core Energy Modeling
3.2.2 Measurements
4 An Overview of GHOST
5 GHOST Applications
5.1 Density of States Computations Using KPM-DOS
5.2 Inner Eigenvalue Computation with Chebyshev Filter Diagonalization (ChebFD)
5.3 Block Jacobi-Davidson QR Method
6 Summary and Outlook
References
Part VIII DASH: Hierarchical Arrays for Efficient and Productive Data-Intensive Exascale Computing
Expressing and Exploiting Multi-Dimensional Locality in DASH
1 Introduction
2 Background
2.1 PGAS and Multi-dimensional Locality
2.2 DASH Concepts
2.2.1 Topology: Teams and Units
2.2.2 Data Distribution: Patterns
3 Classification of Pattern Properties
3.1 Partitioning Properties
3.2 Mapping Properties
3.3 Layout Properties
3.4 Global Properties
4 Exploiting Locality with Pattern Traits
4.1 Deducing Distribution Patterns from Constraints
4.2 Deducing Distribution Patterns for a Specific Use Case
4.3 Checking Distribution Constraints
4.4 Deducing Suitable Algorithm Variants
5 Performance Evaluation
5.1 Eperimental Setup
5.2 Results
6 Related Work
7 Conclusion and Future Work
References
Tool Support for Developing DASH Applications
1 Introduction
2 Related Work
2.1 DASH
2.2 Debugging
2.3 Performance Analysis
3 Overview DASH
3.1 DART: The DASH Runtime
3.2 DASH: Distributed C++ Template Library
4 Debugging DASH Applications
5 Using Score-P to Analyze DASH and DART
5.1 DART
5.2 DASH
6 MPI Profiling
7 PAPI Support in DASH
7.1 The DASH Timer Class
7.2 Fallback Timer Implementations
8 Conclusion and Future Work
References
Part IX EXAMAG: Exascale Simulations of the Evolution of the Universe Including Magnetic Fields
Simulating Turbulence Using the Astrophysical Discontinuous Galerkin Code TENET
1 Introduction
2 Discontinuous Galerkin Methods
2.1 Basis Functions
2.2 Initial Conditions
2.3 Time Evolution Equations
2.4 Time Step Calculation
2.5 Positivity Limiter
3 Turbulence Simulations
3.1 Turbulence Driving
3.2 Dissipation Measurement
3.3 Power Spectrum Measurement
4 Results
4.1 Mach Number Evolution
4.2 Injected and Dissipated Energy
4.3 Velocity Power Spectra
4.4 Density PDFs
5 Discussion
References
Part X FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing
FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing
1 Exascale Challenges
2 FFMK Architecture Overview
3 Microkernel-Based Node OS
4 Dynamic Platform Management
4.1 Application Model
4.2 Monitoring and Gossip-Based Information Dissemination
4.3 Decision Making
5 MPI Runtime
5.1 MPI and Load Balancing
5.2 OS/R Support for Oversubscription
6 Migration
7 Fault Tolerance
8 Related Work
9 Summary and Future Work
References
Fast In-Memory Checkpointing with POSIX API for Legacy Exascale-Applications
1 Introduction
2 Related Work
3 In-Memory Checkpointing with POSIX API
3.1 Implementation with XtreemFS
3.2 Fault-Tolerance and Efficiency with Erasure Codes
4 Deployment on a Supercomputer
4.1 Access to RAM File System
4.1.1 Issues with LD_PRELOAD
4.2 Placement of Services
4.3 Deployment on a Cray XC40
5 Experimental Results
6 Summary
References
Part XI CATWALK: A Quick Development Path for Performance Models
Automatic Performance Modeling of HPC Applications
1 Motivation
2 Overview of Contributions
3 Automatic Empirical Performance Modeling
4 Scalability Validation Framework
5 Compiler-Driven Performance Modeling
6 Related Work
7 Conclusion
References
Automated Performance Modeling of the UG4 Simulation Framework
1 Introduction
2 The UG4 Simulation Framework
2.1 Concepts and Numerical Methods
2.2 Parallel Hierarchical Geometric Multigrid
2.3 Application: Human Skin Permeation
3 Automated Performance Modeling
4 Results
4.1 Analysis for Grid Hierarchy Setup and Solver Comparison
4.2 Scalability of Code Kernels in the Geometric Multigrid
5 Related Work
6 Conclusion
References
Part XII GROMEX: Unified Long-Range Electrostatics and Dynamic Protonation for Realistic Biomolecular Simulations on the Exascale
Accelerating an FMM-Based Coulomb Solver with GPUs
1 Introduction
2 Theoretical Background
2.1 The FMM Workflow
2.2 Mathematical Operators
2.2.1 Multipole-to-Multipole (M2M) Operator
2.3 Rotation-Based Operators
3 Existing Implementation
4 Application Layout
4.1 Custom Allocator
4.2 Pool Allocator
4.3 Merging the CPU and GPU Codebases
5 CUDA Implementation
5.1 Exposing Parallelism
5.2 Results
6 Conclusion
References
Part XIII ExaSolvers: Extreme Scale Solvers for Coupled Problems
Space and Time Parallel Multigrid for Optimization and Uncertainty Quantification in PDE Simulations
1 Introduction
2 Parallel Adaptive Multigrid
3 Empirically Determined Energy Optimal CPU Frequencies
3.1 Approach
3.2 Implementation Details
3.3 Evaluation
4 Parallel in Time Multigrid
5 Scalable Shape Optimization Methods for Structured Inverse Modeling in 3D Diffusive Processes
6 Uncertainty Quantification
7 Conclusion
References
Part XIV Further Contributions
Domain Overlap for Iterative Sparse Triangular Solves on GPUs
1 Introduction
2 Background and Related Work
2.1 Sparse Triangular Solves
2.2 Jacobi Method and Block-Asynchronous Iteration
2.3 Overlapping Domains and Restricted Additive Schwarz
3 Random-Order Alternating Schwarz
3.1 Domain Overlap Based on Matrix Partitioning
3.2 Directed Overlap
4 Restricted Overlap on GPUs
5 Experimental Results
5.1 Test Environment
5.2 Sparse Triangular Solves
6 Summary and Future Work
References
Asynchronous OpenCL/MPI Numerical Simulations of Conservation Laws
1 Introduction
2 Comparison of an OpenCL and an OpenMP Solver on a Regular Grid
2.1 FV Approximation of Conservation Laws
2.2 OpenMP Implementation of the FV Scheme
2.3 OpenCL Implementation of the FV Scheme
2.3.1 OpenCL
2.3.2 Implementation
2.4 OpenCL/MPI FV Solver
3 Asynchronous OpenCL/MPI Discontinuous Galerkin Solver
3.1 The DG Method
3.1.1 Interpolation on Unstructured Hexahedral Meshes
3.1.2 DG Formulation
3.2 OpenCL Kernel for a Single GPU
3.3 Asynchronous MPI/OpenCL Implementation for Several GPUs
3.3.1 Subdomains and Zones
3.3.2 Task Graph
3.4 Efficiency Analysis
3.5 Numerical Results
4 Conclusions
References
Editorial Policy
Lecture Notes in Computational Science and Engineering
Monographs in Computational Science and Engineering
Texts in Computational Science and Engineering

Content (PDF)

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Software for Exascale Computing - SPPEXA 2013-2015

Description

More details

Other editions

Additional editions

Persons

Content

System requirements