
Programming Your GPU with OpenMP
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- The most up-to-date APIs for programming GPUs with OpenMP with concepts that transfer to other approaches for GPU programming.
- Written in a tutorial style that embraces active learning, so that readers can make immediate use of what they learn via provided source code.
- Builds the OpenMP GPU Common Core to get programmers to serious production-level GPU programming as fast as possible.
- A reference guide at the end of the book covering all relevant parts of OpenMP 5.2.
- An online repository containing source code for the example programs from the book-provided in all languages currently supported by OpenMP: C, C++, and Fortran.
- Tutorial videos and lecture slides.
More details
Other editions
Additional editions

Persons
Timothy G. Mattson is a senior principal engineer at Intel where he's worked since 1993 on: the first TFLOP computer; the creation of MPI, OpenMP, and OpenCL; HW/SW co-design of many-core processors; data management systems; and the GraphBLAS API for expressing graph algorithms as sparse linear algebra.
Content
- Intro
- Contents
- Series Foreword
- Preface
- Acknowledgments
- I. Setting the Stage
- 1. Heterogeneity and the Future of Computing
- 1.1 The Basic Building Blocks of Modern Computing
- 1.1.1 The CPU
- 1.1.2 The SIMD Vector Unit
- 1.1.3 The GPU
- 1.2 OpenMP: A Single Code-Base for Heterogeneous Hardware
- 1.3 The Structure of This Book
- 1.4 Supplementary Materials
- 2. OpenMP Overview
- 2.1 Threads: Basic Concepts
- 2.2 OpenMP: Basic Syntax
- 2.3 The Fundamental Design Patterns of OpenMP
- 2.3.1 The SPMD Pattern
- 2.3.2 The Loop-Level Parallelism Pattern
- 2.3.3 The Divide-and-Conquer Pattern
- 2.3.3.1 Tasks in OpenMP
- 2.3.3.2 Parallelizing Divide-and-Conquer
- 2.4 Task Execution
- 2.5 Our Journey Ahead
- II. The GPU Common Core
- 3. Running Parallel Code on a GPU
- 3.1 Target Construct: Offloading Execution onto a Device
- 3.2 Moving Data between the Host and a Device
- 3.2.1 Scalar Variables
- 3.2.2 Arrays on the Stack
- 3.2.3 Derived Types
- 3.3 Parallel Execution on the Target Device
- 3.4 Concurrency and the Loop Construct
- 3.5 Example: Walking through Matrix Multiplication
- 4. Memory Movement
- 4.1 OpenMP Array Syntax
- 4.2 Sharing Data Explicitly with the Map Clause
- 4.2.1 The Map Clause
- 4.2.2 Example: Vector Add on the Heap
- 4.2.3 Example: Mapping Arrays in Matrix Multiplication
- 4.3 Reductions and Mapping the Result from the Device
- 4.4 Optimizing Data Movement
- 4.4.1 Target Data Construct
- 4.4.2 Target Update Directive
- 4.4.3 Target Enter/Exit Data
- 4.4.4 Pointer Swapping
- 4.5 Summary
- 5. Using the GPU Common Core
- 5.1 Recap of the GPU Common Core
- 5.2 The Eightfold Path to Performance
- 5.2.1 Portability
- 5.2.2 Libraries
- 5.2.3 The Right Algorithm
- 5.2.4 Occupancy
- 5.2.5 Converged Execution Flow
- 5.2.6 Data Movement
- 5.2.7 Memory Coalescence
- 5.2.8 Load Balance
- 5.3 Concluding the GPU Common Core
- III. Beyond the Common Core
- 6. Managing a GPU's Hierarchical Parallelism
- 6.1 Parallel Threads
- 6.2 League of Teams of Threads
- 6.2.1 Controlling the Number of Teams and Threads
- 6.2.2 Distributing Work between Teams
- 6.3 Hierarchical Parallelism in Practice
- 6.3.1 Example: Batched Matrix Multiplication
- 6.3.2 Example: Batched Gaussian Elimination
- 6.4 Hierarchical Parallelism and the Loop Directive
- 6.4.1 Combined Constructs that Include Loop
- 6.4.2 Reductions and Combined Constructs
- 6.4.3 The Bind Clause
- 6.5 Summary
- 7. Revisiting Data Movement
- 7.1 Manipulating the Device Data Environment
- 7.1.1 Allocating and Deleting Variables
- 7.1.2 Map Type Modifiers
- 7.1.3 Changing the Default Mapping
- 7.2 Compiling External Functions and Static Variables for the Device
- 7.3 User-Defined Mappers
- 7.4 Team-Only Memory
- 7.5 Becoming a Cartographer: Mapping Device Memory by Hand
- 7.6 Unified Shared Memory for Productivity
- 7.7 Summary
- 8. Asynchronous Offload to Multiple GPUs
- 8.1 Device Discovery
- 8.2 Selecting a Default Device
- 8.3 Offload to Multiple Devices
- 8.3.1 Reverse Offload
- 8.4 Conditional Offload
- 8.5 Asynchronous Offload
- 8.5.1 Task Dependencies
- 8.5.2 Asynchronous Data Transfers
- 8.5.3 Task Reductions
- 8.6 Summary
- 9. Working with External Runtime Environments
- 9.1 Calling External Library Routines from OpenMP
- 9.2 Sharing OpenMP Data with Foreign Functions
- 9.2.1 The Need for Synchronization
- 9.2.2 Example: Sharing OpenMP Data with cuBLAS
- 9.3 Using Data from a Foreign Runtime with OpenMP
- 9.3.1 Example: Sharing cuBLAS Data with OpenMP
- 9.3.2 Avoiding Unportable Code
- 9.4 Direct Control of Foreign Runtimes
- 9.4.1 Query Properties of the Foreign Runtime
- 9.4.2 Using the Interop Construct to Correctly Synchronize with Foreign Functions
- 9.4.3 Non-blocking Synchronization with a Foreign Runtime
- 9.4.4 Example: Calling CUDA Kernels without Blocking
- 9.5 Enhanced Portability Using Variant Directives
- 9.5.1 Declaring Function Variants
- 9.5.1.1 OpenMP Context and the Match Clause
- 9.5.1.2 Modifying Variant Function Arguments
- 9.5.2 Controlling Variant Substitution with the Dispatch Construct
- 9.5.3 Putting It All Together
- 10. OpenMP and the Future of Heterogeneous Computing
- Appendix: Reference Guide
- A.1 Programming a CPU with OpenMP
- A.2 Directives and Constructs for the GPU
- A.2.1 Parallelism with Loop, Teams, and Worksharing Constructs
- A.2.2 Constructs for Interoperability
- A.2.3 Constructs for Device Data Environment Manipulation
- A.3 Combined Constructs
- A.4 Internal Control Variables, Environment Variables, and OpenMP API Functions
- Glossary
- References
- Subject Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.