
Accelerator Programming Using Directives
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 7 full papers presented have been carefully reviewed and selected from 13 submissions. The papers share knowledge and experiences to program emerging complex parallel computing systems. They are organized in the following three sections: porting scientific applications to heterogeneous architectures using directives; directive-based programming for math libraries; and performance portability for heterogeneous architectures.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Porting Scientific Applications to Heterogeneous Architectures Using Directives
- GPU Implementation of a Sophisticated Implicit Low-Order Finite Element Solver with FP21-32-64 Computation Using OpenACC
- 1 Introduction
- 2 Baseline Solver on CPU-based Computers
- 2.1 The Target Problem
- 2.2 The Solver Algorithm
- 2.3 Implementation of Solver for CPU Systems
- 3 GPU Implementation Using OpenACC
- 3.1 Baseline Implementation
- 3.2 Introduction of Lower-Precision Data Types
- 3.3 Miscellaneous Optimizations in the Solver
- 4 Performance Measurement
- 4.1 Performance Evaluation of FP21 Computation
- 4.2 Performance Evaluation of the Entire Solver
- 5 Conclusions
- References
- Acceleration in Acoustic Wave Propagation Modelling Using OpenACC/OpenMP and Its Hybrid for the Global Monitoring System
- 1 Introduction
- 2 Computing Environment
- 3 3D-SSFPE
- 3.1 Overview
- 3.2 Implementation
- 3.3 Performance Evaluation and Conclusion
- 4 Global Acoustic Simulation with FDM
- 4.1 Overview
- 4.2 Formulation
- 4.3 Yin-Yang Grid
- 4.4 Computational Schemes
- 4.5 Performance Optimization
- 4.6 Software Evaluation
- 5 Conclusion
- References
- Accelerating the Performance of Modal Aerosol Module of E3SM Using OpenACC
- 1 Introduction
- 2 OpenACC Programming Model
- 2.1 Parallelizing Loops
- 2.2 Data Transfer
- 3 Experimental Platforms and Approach
- 4 MAM Algorithms and Kernels
- 5 Offloading Computations to GPUs
- 5.1 Kernel: subgrid_mean_updraft
- 5.2 Kernel: hetfrz_classnuc_cam_calc
- 5.3 Kernel: ccncalc
- 5.4 Kernel: nsubmix
- 6 MAM Kernel Performance Discussion
- 6.1 Multi-Process Service (MPS)
- 6.2 Scaling Results
- 7 Summary and Conclusion
- References
- Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices
- 1 Introduction
- 2 Background and Related Work
- 3 Methodology
- 3.1 The LOBPCG Algorithm
- 3.2 Baseline CPU Implementation
- 3.3 A GPU Implementation of LOBPCG
- 3.4 Tiling LOBPCG Kernels to Fit in GPU Memory Capacity
- 3.5 Hardware and Software Environment
- 3.6 Experiments
- 4 Results
- 4.1 Performance of the LOBPCG Solver
- 4.2 Performance of XTY and SpMM Kernels for Large Matrices
- 4.3 Performance of Tiled and Unified Memory Versions of SpMM
- 5 Discussion
- 6 Conclusions
- References
- Directive-Based Programming for Math Libraries
- Performance of the RI-MP2 Fortran Kernel of GAMESS on GPUs via Directive-Based Offloading with Math Libraries
- 1 Introduction
- 2 RI-MP2 Kernel of GAMESS
- 2.1 RI-MP2 Kernel
- 2.2 Inputs for the RI-MP2 Kernel from GAMESS
- 3 Employed Systems
- 3.1 Summit System at Oak Ridge Leadership Computing Facility
- 3.2 JLSE System at Argonne Leadership Computing Facility
- 4 Programming Environments
- 4.1 Employed Compilers
- 4.2 Math Libraries
- 5 Offloading the RI-MP2 Kernel
- 5.1 The RI-MP2 Kernel with OpenMP Threading
- 5.2 Offloading the RI-MP2 Kernels to GPUs via OpenMP 4.5 and OpenACC 2.6
- 5.3 Performance Results
- 6 Offloading the Restructured RI-MP2 Kernel
- 6.1 Restructuring the RI-MP2 Kernel for an Optimized Performance on a GPU
- 6.2 Performance Results of the Restructured RI-MP2 Kernel
- 7 Performance of the Restructured RI-MP2 Kernel on Multiple GPUs via MPI+OpenMP Offloading
- 8 Concluding Remarks
- References
- Performance Portability for Heterogeneous Architectures
- Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App
- 1 Introduction
- 2 Testbed Description
- 3 GYSELA Mini-App and Baseline OpenMP Implementation
- 3.1 Four-Dimensional Vlasov-Poisson System
- 3.2 Algorithm
- 3.3 Baseline OpenMP Implementation
- 3.4 Characteristics of Kernels
- 4 GPU Implementation of GYSELA Mini-App
- 4.1 Kokkos Implementation of GYSELA Mini-application
- 4.2 OpenACC Implementation of GYSELA Mini-application
- 5 Portable Implementation of GYSELA Mini-App with OpenACC/OpenMP and Kokkos
- 5.1 Portable Implementation with Kokkos
- 5.2 Portable Implementation with OpenACC/OpenMP
- 5.3 Performance Comparison of Baseline Versions
- 5.4 3D MDRange Policy
- 6 Performance Portability, Readability and Productivity
- 6.1 Performance Evaluation
- 6.2 Readability
- 6.3 Productivity
- 7 Summary
- References
- A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures
- 1 Introduction
- 2 Related Work
- 3 Portable simd Primitive
- 4 Experiments
- 4.1 PDE Assembly
- 4.2 2D Convolution
- 4.3 Compact gemm
- 4.4 Embedded Ensemble Propagation
- 5 Results and Performance Analysis
- 5.1 PDE Assembly
- 5.2 2D Convolution
- 5.3 Compact gemm
- 5.4 Embedded Ensemble Propagation
- 6 Assigning the Optimal LVL Value
- 7 Conclusion and Future Work
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.