Accelerated Computing with HIP

Name: Accelerated Computing with HIP
Brand: Sun, Baruah and Kaeli
Price: 15.49 EUR
Availability: OnlineOnly

Yifan Sun Trinayan Baruah David Kaeli(Author)

Sun, Baruah and Kaeli (Publisher)

Published on 9. December 2022

226 pages

E-Book

ePUB with Adobe-DRM

System requirements

979-8-218-10745-1 (ISBN)

€15.49incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Persons

Content

Intro
Title Page
Copyright Page
Contents
Foreword
Preface
Acknowledgements
1. Introduction
1.1 Parallel Programming
1.2 GPUs
1.3 ROCm
1.4 HIP Framework
1.5 What This Book Covers
2. HIP Language
2.1 Introduction
2.2 "Hello World" in HIP
2.3 Compilation and Execution
2.4 HIP Runtime API
2.5 HIP Kernel Launch
2.6 HIP Program Structure
2.7 Error and Correctness Checking
2.8 Organizing Threads
2.9 vector_add in HIP
2.10 Conclusion
3. AMD GPU Internals
3.1 AMD GPUs
3.2 Overall Architecture
3.3 Command Processor and the DMA Engine
3.4 Workgroup Dispatching
3.5 Sequencer
3.6 SIMD Unit
3.7 Thread Divergence
3.8 Memory Coalescing
3.9 Memory Hierarchy
3.10 Conclusion
4. HIP Tools for Performance Analysis & Debug
4.1 ROCm Profiler
4.2 ROCm Debugger
4.3 ROCm SMI
4.4 Conclusion
5. HIP Programming Patterns
5.1 Highly Parallel Workload - Image Gamma Correction
5.2 Multidimensional Kernels-Stencil
5.3 Fixed-Sized Kernels-Image Gamma Correction
5.4 Reduce-Array Sum
5.5 Tiling & Reuse - Matrix Multiplication
5.6 Tiling & Coalescing: Matrix Transpose
5.7 Kernel-Level Synchronization: BFS
5.8 Conclusion
6. HIP Streams
6.1 Basics of Streams
6.2 Important Stream-Based APIs
6.3 Execution of HIP Streams on GPU Hardware
6.4 Default and Non-Default Streams
6.5 Concurrent Kernels
6.6 Overlapping Computation and Communication
6.7 Conclusion
7. ROCm Libraries
7.1 rocBLAS
7.1.1Using rocBLAS
7.1.2rocBLAS functions7.1.3Asynchronous execution
7.1.4rocBLAS on MI100
7.1.5Porting from the legacy BLAS library
7.2 rocSPARSE
7.2.1Sparse data representation
7.2.2rocSPARSE functions
7.3 rocFFT
7.3.1rocFFT workflow
7.3.2FFT execution plan
7.4 rocRAND
7.5 Conclusion
8. Porting CUDA Programs to HIP
8.1 Hipify Tools
8.1.1Hipify-clang
8.1.2Hipify-perl
8.2 General Hipifying Guidelines8.3 Hipification of Matrix-Transpose
8.4 Common Pitfalls and Solutions
8.5 Conclusion
9. Multi-GPU Programming
9.1 HIP Device APIs
9.2 Stream-Based Multi-GPU Programming
9.3 Thread-Based Multi-GPU Programming
9.4 MPI-Based Multi-GPU Programming
9.5 GPU-GPU Communication
9.6 RCCL
9.6.1Broadcast
9.6.2AllReduce
9.7 Conclusion
10. ROCm in Datacenters
10.1 Containerized ROCm
10.2 Managing ROCm Containers with Kubernetes
10.3 Managing ROCm Nodes with SLURM
10.3.1SLURM interactive mode
10.3.2SLURM batch submission mode
10.4 Conclusion
11. Third-Party Tools
11.1 PAPI
11.1.1Introduction
11.1.2PAPI utilities and tests
11.1.3PAPI support for AMD GPUs
11.1.4Preset events and Counter Analysis Toolkit (CAT)
11.2 Score-P and Vampir
11.2.1Overview
11.2.2Tracing with Score-P
11.2.3Score-P usage
11.2.4Profiling the Quicksilver application
11.2.5Summary
11.3 Trace Compass and Theia
11.4 TAU
11.4.1Profiling HIP programs with TAU
11.4.2Tracing HIP programs with TAU
11.4.3Using APEX to measure HIP programs
11.4.4Summary of TAU
11.5 TotalView Debugger
11.6 HPCToolkit
11.6.1HPCToolkit's workflow
11.6.2Analyzing PIConGPU with HPCToolkit
11.6.3Collecting and analyzing profiles and traces
11.6.4Measurement using hardware counters
11.7 E4S - Extreme Scale Scientific Software Stack
11.7.1E4S release
A. CDNA Assembly
A. 1 Using CDNA Assembly Code
A.1.1Retrieve HIP kernel binary
A.1.2Disassembling a CDNA binary
A. 2 CDNA Registers
A. 3 Instruction Types
A. 4 Memory Access Instructions
A. 5 Example: Shifted Copy
A. 6 Example: Branching
A. 7 Conclusion
B. ML with ROCm
B. 1 PyTorch on ROCm
B. 2TensorFlow on ROCm
B. 3 Conclusion
Bibliography

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Accelerated Computing with HIP

Description

More details

Persons

Content

System requirements