
Software for Exascale Computing - SPPEXA 2013-2015
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Persons
Content
- Intro
- Preface
- Contents
- Part I EXA-DUNE: Flexible PDE Solvers, Numerical Methods, and Applications
- Hardware-Based Efficiency Advances in the Exa-Dune Project
- 1 The Exa-Dune Project
- 2 Hybrid Parallelism in DUNE
- 2.1 UMA Concept
- 3 Assembly
- 3.1 Thread Parallel Assembly
- 3.2 Higher Order DG Methods
- 3.3 Low Order Lagrange Methods
- 4 Linear Algebra
- 4.1 Efficient Matrix Format for Higher Order DG
- 4.2 GPU Accelerated Preconditioners and Strong Smoothers
- 5 Outlook
- References
- Advances Concerning Multiscale Methods and Uncertainty Quantification in Exa-Dune
- 1 Introduction
- 2 Numerical Multiscale Methods: A Case of Generality
- 2.1 The Multiscale Finite Element Method for Multiscale Elliptic Equations
- 2.2 Implementation and Parallelization
- 2.3 Hybrid MPI/SMP Implementation
- 3 The Multi-level Monte-Carlo Method
- 3.1 Principle
- 3.2 Implementation
- 4 Numerical Experiments
- 5 Conclusion
- References
- Part II ExaStencils: Advanced Stencil-Code Engineering
- Systems of Partial Differential Equations in ExaSlang
- 1 Introduction
- 2 Multigrid Methods
- 3 The ExaStencils Approach
- 4 The ExaStencils DSL ExaSlang
- 4.1 Multi-layered Approach
- 4.2 Overview of ExaSlang 4
- 4.2.1 Stencils
- 4.2.2 Fields and Layouts
- 4.2.3 Data Types, Variables, and Values
- 4.2.4 Control Flow
- 4.2.5 Level Specifications
- 5 Code Generation
- 6 Data Types for Systems of Partial Differential Equations
- 6.1 Motivation
- 6.2 The ExaSlang Data Types
- 7 Modifications to the Code Generator
- 8 Example Application
- 8.1 Theoretical Background
- 8.2 Mapping to ExaSlang 4
- 8.3 Results
- 9 Related Work
- 10 Future Work
- 11 Conclusions
- References
- Performance Prediction of Multigrid-Solver Configurations
- 1 Introduction
- 2 Configurable Multigrid Solvers and the ExaStencils Code Generator
- 3 Performance Prediction
- 3.1 Sampling
- 3.1.1 Binary Sampling Heuristics
- 3.1.2 Experimental Designs
- 3.2 Performance-Influence Models
- 3.3 Integration of Domain Knowledge
- 3.3.1 Shrinking the Configuration Space
- 3.3.2 Domain Knowledge on Interactions
- 3.3.3 Independent Sampling Strategies and Independent Models
- 3.3.4 Integration of Analytical Models
- 3.3.5 Models for Disjoint Parts of a System
- 4 Evaluation
- 4.1 Leveraging Domain Knowledge
- 4.1.1 Experimental Setup
- 4.1.2 Results and Discussion
- 4.2 Code Generator
- 4.2.1 Experimental Setup
- 4.2.2 Results and Discussion
- 4.3 Threats to Validity
- 5 Related Work
- 6 Conclusion and Future Work
- References
- Part III EXASTEEL: Bridging Scales for Multiphase Steels
- One-Way and Fully-Coupled FE2 Methods for Heterogeneous Elasticity and Plasticity Problems: Parallel Scalability and an Application to Thermo-Elastoplasticity of Dual-Phase Steels
- 1 Introduction
- 2 Thermodynamic and Continuum Mechanical Framework
- 2.1 Incorporation of Thermo-mechanics
- 2.2 Implementation Using a Complex Step Derivative Approximation
- 3 Framework for Direct-Micro-Macro Computations
- 3.1 General Approach
- 3.2 Approaches for Multiphase-Steel Incorporating Thermo-mechanics
- 4 Numerical Examples for the One-Way FE2 Coupling
- 5 FE2TI: A Parallel Implementation of the Fully Coupled FE2 Approach
- 5.1 Implementation Remarks
- 5.2 Production Runs on the JUQUEEN Supercomputer
- 5.3 Strong Scalability on JUQUEEN
- 6 Conclusion
- References
- Scalability of Classical Algebraic Multigrid for Elasticity to Half a Million Parallel Tasks
- 1 Introduction
- 2 Algebraic Multigrid
- 3 Algebraic Multigrid for Systems of PDEs
- 4 The Global Matrix Approach
- 5 The Local Neighborhood Approach
- 6 Numerical Results
- 6.1 Results in Two Dimensions
- 6.2 Results in Three Dimensions
- 6.2.1 3D Beam Problem
- 6.2.2 3D Beam Problem with Double Length
- 6.2.3 3D Cuboid Problem
- 6.3 Parallel Problem Assembly and Reordering Process
- 7 Conclusions
- References
- Part IV EXAHD: An Exa-Scalable Two-Level Sparse Grid Approach for Higher-Dimensional Problems in Plasma Physics and Beyond
- Recent Developments in the Theory and Application of the Sparse Grid Combination Technique
- 1 Introduction
- 2 A Class of Combination Techniques
- 3 Algorithms and Data Structures
- 4 Modified Combination Coefficients
- 5 Computing Eigenvalues and Eigenvectors
- 5.1 An Opticom Approach for Solving the Eigenvalue Problem
- 5.2 Iterative Refinement and Iterative Methods
- 6 Conclusions
- References
- Scalable Algorithms for the Solution of Higher-Dimensional PDEs
- 1 Introduction
- 1.1 Sparse Grid Combination Technique
- 1.2 Large Scale Plasma Turbulence Simulations with GENE
- 2 Software Framework for Large-Scale Computations with the Combination Technique
- 3 Scalable Algorithms for the Combination Step with Distributed Component Grids
- 3.1 Distributed Hierarchization/Dehierarchization
- 3.2 Local Reduction/Scatter of Component Grids Inside the Process Group
- 3.2.1 Variant 1: General Reduction of Distributed Component Grids
- 3.2.2 Variant 2: Communication-Free Local Reduction of Uniformly Parallelized Component Grids
- 3.3 Global Reduction of the Combination Solution
- 4 Results
- 5 Conclusion and Future Work
- References
- Handling Silent Data Corruption with the Sparse Grid Combination Technique
- 1 Introduction
- 1.1 Understanding Silent Data Corruption
- 1.2 Statement of the Problem
- 2 Basics of Sparse Grids
- 2.1 The Sparse Grid Combination Technique
- 3 The SGCT in Parallel and Fault Tolerance with the Combination Technique
- 3.1 SDC and the Combination Technique
- 3.2 Sanity Check 1: Filtering SDC via Comparison of Pairs of Solutions
- 3.3 Sanity Check 2: Filtering SDC via Outlier Detection
- 4 Numerical Tests
- 4.1 Experimental Setup
- 4.2 Results
- 5 Conclusions
- References
- Part V TERRA-NEO: Integrated Co-Design of an Exascale Earth Mantle Modeling Framework
- Hybrid Parallel Multigrid Methods for Geodynamical Simulations
- 1 Introduction
- 2 Geodynamical Modeling
- 3 Discretization and Hybrid Parallel Multigrid Methods
- 3.1 Finite Element Discretization
- 3.2 Multigrid Solvers and the HHG Framework
- 4 Scalability and Performance of the Multigrid Method
- 4.1 Operator Counts
- 4.2 Scalability
- 4.3 Fault Tolerance
- 4.4 Performance
- 5 Application to the Earth's Upper Mantle
- 6 Simulations of the Coupled Problem
- 7 Conclusion
- References
- Part VI ExaFSA: Exascale Simulationof Fluid-Structure-Acoustics Interactions
- Partitioned Fluid-Structure-Acoustics Interaction on Distributed Data: Coupling via preCICE
- 1 Introduction
- 2 Coupling Building Blocks on Distributed Data
- 2.1 Communication of Distributed Data
- 2.1.1 Surface Mesh Re-Partitioning
- 2.1.2 Point-to-Point Communication
- 2.2 Interpolation Methods on Distributed Data
- 2.2.1 Projection-Based Interpolation
- 2.2.2 Radial Basis Function (RBF) Interpolation
- 2.3 Fixed-Point Acceleration Methods on Distributed Data
- 2.3.1 Theory of Robust Quasi-Newton Fixed-Point Acceleration
- 2.3.2 Implementational Aspects of Quasi-Newton Coupling Iterations
- 3 Scalability Study
- 3.1 Testcase Description
- 3.2 Strong Scaling for n=5122 = 262,144
- 3.3 Strong Scaling for n=1282 = 16,384
- 3.4 Varying Problem Size n=16,...,128
- 4 Conclusions
- References
- Partitioned Fluid-Structure-Acoustics Interaction on Distributed Data: Numerical Results and Visualization
- 1 Introduction
- 2 Description of the Individual Solvers
- 2.1 Fluid Dynamics in the Acoustic Near Field
- 2.1.1 OpenFOAM: Compressible Flow Solver
- 2.1.2 FASTEST: Incompressible Flow Solver
- 2.2 Acoustic Wave Propagation
- 2.2.1 FASTEST: Acoustic Near Field
- 2.2.2 Ateles: Acoustic Far Field
- 2.3 Structural Dynamics
- 2.3.1 OpenFOAM: Finite Volume Structure Solver
- 2.3.2 FEAP: Finite Element Structure Solver
- 3 Coupling
- 3.1 Coupling the Elastic Structure with the Acoustic Fluid
- 3.2 Coupling the Acoustic Near Field with the Far Field
- 3.3 Coupling the Incompressible Flow with Acoustic Perturbations
- 4 Visualization
- 4.1 In-Situ Visualization
- 4.2 Simulation-Visualization Setup
- 4.3 Intermediate Representation: Volumetric Depth Images
- 4.4 Visualization Transform and Render
- 5 The Three-Dimensional Bending Tower Testcase
- 5.1 Testcase Description
- 5.2 Numerical Results
- 5.3 Scaling Results
- 5.4 Visualization
- 6 Conclusion and Outlook
- References
- Part VII ESSEX: Equipping Sparse Solvers for Exascale
- Towards an Exascale Enabled Sparse Solver Repository
- 1 Introduction
- 2 ESSR Architecture and Development Process
- 2.1 Software Architecture
- 2.2 Concurrent Development of all Layers
- 2.3 Integration of Performance Engineering
- 2.4 Fault Tolerance Strategy
- 3 ESSR Software Landscape
- 3.1 Hardware and Execution Models Supported
- 3.2 ESSR Toolkits and Functionality
- 3.3 Applications
- 3.4 Kernel Interface
- 3.5 Computational Core
- 3.6 Verifying Software Correctness and Performance
- 4 Algorithms Implemented in the ESSR
- 4.1 Algorithms Based on Chebyshev Polynomials
- 4.2 Beyond FEAST: Projection Based Methods
- 4.3 Block Jacobi-Davidson QR
- 5 Fault Tolerance
- 6 Summary and Outlook
- References
- Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers
- 1 Introduction
- 2 Contribution
- 3 Holistic Performance Engineering Driving Energy Efficiency on the Example of the Kernel Polynomial Method (KPM)
- 3.1 Performance Engineering for KPM
- 3.1.1 Sparse Matrix Data Format
- 3.1.2 Kernel Fusion and Blocking
- 3.2 Single-Socket Performance and Energy Analysis
- 3.2.1 Multi-Core Energy Modeling
- 3.2.2 Measurements
- 4 An Overview of GHOST
- 5 GHOST Applications
- 5.1 Density of States Computations Using KPM-DOS
- 5.2 Inner Eigenvalue Computation with Chebyshev Filter Diagonalization (ChebFD)
- 5.3 Block Jacobi-Davidson QR Method
- 6 Summary and Outlook
- References
- Part VIII DASH: Hierarchical Arrays for Efficient and Productive Data-Intensive Exascale Computing
- Expressing and Exploiting Multi-Dimensional Locality in DASH
- 1 Introduction
- 2 Background
- 2.1 PGAS and Multi-dimensional Locality
- 2.2 DASH Concepts
- 2.2.1 Topology: Teams and Units
- 2.2.2 Data Distribution: Patterns
- 3 Classification of Pattern Properties
- 3.1 Partitioning Properties
- 3.2 Mapping Properties
- 3.3 Layout Properties
- 3.4 Global Properties
- 4 Exploiting Locality with Pattern Traits
- 4.1 Deducing Distribution Patterns from Constraints
- 4.2 Deducing Distribution Patterns for a Specific Use Case
- 4.3 Checking Distribution Constraints
- 4.4 Deducing Suitable Algorithm Variants
- 5 Performance Evaluation
- 5.1 Eperimental Setup
- 5.2 Results
- 6 Related Work
- 7 Conclusion and Future Work
- References
- Tool Support for Developing DASH Applications
- 1 Introduction
- 2 Related Work
- 2.1 DASH
- 2.2 Debugging
- 2.3 Performance Analysis
- 3 Overview DASH
- 3.1 DART: The DASH Runtime
- 3.2 DASH: Distributed C++ Template Library
- 4 Debugging DASH Applications
- 5 Using Score-P to Analyze DASH and DART
- 5.1 DART
- 5.2 DASH
- 6 MPI Profiling
- 7 PAPI Support in DASH
- 7.1 The DASH Timer Class
- 7.2 Fallback Timer Implementations
- 8 Conclusion and Future Work
- References
- Part IX EXAMAG: Exascale Simulations of the Evolution of the Universe Including Magnetic Fields
- Simulating Turbulence Using the Astrophysical Discontinuous Galerkin Code TENET
- 1 Introduction
- 2 Discontinuous Galerkin Methods
- 2.1 Basis Functions
- 2.2 Initial Conditions
- 2.3 Time Evolution Equations
- 2.4 Time Step Calculation
- 2.5 Positivity Limiter
- 3 Turbulence Simulations
- 3.1 Turbulence Driving
- 3.2 Dissipation Measurement
- 3.3 Power Spectrum Measurement
- 4 Results
- 4.1 Mach Number Evolution
- 4.2 Injected and Dissipated Energy
- 4.3 Velocity Power Spectra
- 4.4 Density PDFs
- 5 Discussion
- References
- Part X FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing
- FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing
- 1 Exascale Challenges
- 2 FFMK Architecture Overview
- 3 Microkernel-Based Node OS
- 4 Dynamic Platform Management
- 4.1 Application Model
- 4.2 Monitoring and Gossip-Based Information Dissemination
- 4.3 Decision Making
- 5 MPI Runtime
- 5.1 MPI and Load Balancing
- 5.2 OS/R Support for Oversubscription
- 6 Migration
- 7 Fault Tolerance
- 8 Related Work
- 9 Summary and Future Work
- References
- Fast In-Memory Checkpointing with POSIX API for Legacy Exascale-Applications
- 1 Introduction
- 2 Related Work
- 3 In-Memory Checkpointing with POSIX API
- 3.1 Implementation with XtreemFS
- 3.2 Fault-Tolerance and Efficiency with Erasure Codes
- 4 Deployment on a Supercomputer
- 4.1 Access to RAM File System
- 4.1.1 Issues with LD_PRELOAD
- 4.2 Placement of Services
- 4.3 Deployment on a Cray XC40
- 5 Experimental Results
- 6 Summary
- References
- Part XI CATWALK: A Quick Development Path for Performance Models
- Automatic Performance Modeling of HPC Applications
- 1 Motivation
- 2 Overview of Contributions
- 3 Automatic Empirical Performance Modeling
- 4 Scalability Validation Framework
- 5 Compiler-Driven Performance Modeling
- 6 Related Work
- 7 Conclusion
- References
- Automated Performance Modeling of the UG4 Simulation Framework
- 1 Introduction
- 2 The UG4 Simulation Framework
- 2.1 Concepts and Numerical Methods
- 2.2 Parallel Hierarchical Geometric Multigrid
- 2.3 Application: Human Skin Permeation
- 3 Automated Performance Modeling
- 4 Results
- 4.1 Analysis for Grid Hierarchy Setup and Solver Comparison
- 4.2 Scalability of Code Kernels in the Geometric Multigrid
- 5 Related Work
- 6 Conclusion
- References
- Part XII GROMEX: Unified Long-Range Electrostatics and Dynamic Protonation for Realistic Biomolecular Simulations on the Exascale
- Accelerating an FMM-Based Coulomb Solver with GPUs
- 1 Introduction
- 2 Theoretical Background
- 2.1 The FMM Workflow
- 2.2 Mathematical Operators
- 2.2.1 Multipole-to-Multipole (M2M) Operator
- 2.3 Rotation-Based Operators
- 3 Existing Implementation
- 4 Application Layout
- 4.1 Custom Allocator
- 4.2 Pool Allocator
- 4.3 Merging the CPU and GPU Codebases
- 5 CUDA Implementation
- 5.1 Exposing Parallelism
- 5.2 Results
- 6 Conclusion
- References
- Part XIII ExaSolvers: Extreme Scale Solvers for Coupled Problems
- Space and Time Parallel Multigrid for Optimization and Uncertainty Quantification in PDE Simulations
- 1 Introduction
- 2 Parallel Adaptive Multigrid
- 3 Empirically Determined Energy Optimal CPU Frequencies
- 3.1 Approach
- 3.2 Implementation Details
- 3.3 Evaluation
- 4 Parallel in Time Multigrid
- 5 Scalable Shape Optimization Methods for Structured Inverse Modeling in 3D Diffusive Processes
- 6 Uncertainty Quantification
- 7 Conclusion
- References
- Part XIV Further Contributions
- Domain Overlap for Iterative Sparse Triangular Solves on GPUs
- 1 Introduction
- 2 Background and Related Work
- 2.1 Sparse Triangular Solves
- 2.2 Jacobi Method and Block-Asynchronous Iteration
- 2.3 Overlapping Domains and Restricted Additive Schwarz
- 3 Random-Order Alternating Schwarz
- 3.1 Domain Overlap Based on Matrix Partitioning
- 3.2 Directed Overlap
- 4 Restricted Overlap on GPUs
- 5 Experimental Results
- 5.1 Test Environment
- 5.2 Sparse Triangular Solves
- 6 Summary and Future Work
- References
- Asynchronous OpenCL/MPI Numerical Simulations of Conservation Laws
- 1 Introduction
- 2 Comparison of an OpenCL and an OpenMP Solver on a Regular Grid
- 2.1 FV Approximation of Conservation Laws
- 2.2 OpenMP Implementation of the FV Scheme
- 2.3 OpenCL Implementation of the FV Scheme
- 2.3.1 OpenCL
- 2.3.2 Implementation
- 2.4 OpenCL/MPI FV Solver
- 3 Asynchronous OpenCL/MPI Discontinuous Galerkin Solver
- 3.1 The DG Method
- 3.1.1 Interpolation on Unstructured Hexahedral Meshes
- 3.1.2 DG Formulation
- 3.2 OpenCL Kernel for a Single GPU
- 3.3 Asynchronous MPI/OpenCL Implementation for Several GPUs
- 3.3.1 Subdomains and Zones
- 3.3.2 Task Graph
- 3.4 Efficiency Analysis
- 3.5 Numerical Results
- 4 Conclusions
- References
- Editorial Policy
- Lecture Notes in Computational Science and Engineering
- Monographs in Computational Science and Engineering
- Texts in Computational Science and Engineering
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.