
OpenMP: Memory, Devices, and Tasks
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 24 full papers presented in this volume were carefully reviewed and selected from 28 submissions. They were organized in topical sections named: applications, locality, task parallelism, extensions, tools, accelerator programming, and performance evaluations and optimization.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Applications
- Estimation of Round-off Errors in OpenMP Codes
- 1 Introduction
- 2 The CADNA Library
- 2.1 Principles of DSA (Discrete Stochastic Arithmetic)
- 2.2 Numerical Validation of Sequential Codes Using CADNA
- 3 CADNA for OpenMP Codes
- 4 Performance Tests and Application to OpenMP Codes
- 4.1 A Reduction Code
- 4.2 Performance Tests
- 4.3 A Shallow-Water Application
- 5 Conclusion
- References
- OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
- 1 Introduction
- 2 Graph-Based Classification Algorithms
- 2.1 Introduction
- 2.2 Semi-supervised and Unsupervised Algorithms
- 2.3 Nyström Extension Method
- 3 Math Library Usage and Optimizations
- 4 Parallelization of the Nyström Extension
- 4.1 OpenMP Parallelization
- 4.2 Arithmetic Intensity and Roofline Model
- 5 Conclusion and Future Work
- References
- Locality
- Evaluating OpenMP Affinity on the POWER8 Architecture
- 1 Introduction
- 1.1 Memory Placement
- 1.2 Thread Affinity
- 1.3 POWER8 System
- 1.4 POWER8 Hardware Counters
- 2 Motivation
- 3 Experimentation
- 3.1 Experimental Setup
- 4 Results
- 5 Related Work
- 6 Conclusions and Future Work
- References
- Workstealing and Nested Parallelism in SMP Systems
- 1 Introduction
- 2 Terminology
- 3 Static Workstealing Scheduler
- 3.1 Motivation
- 3.2 Scheduler Implementation
- 3.3 OpenMP Scheduler Constraints
- 4 ISO-3DFD Test Code
- 5 ISO-3DFD Optimization
- 5.1 Nested Parallelism vs. Hand Threading
- 5.2 Performance Results
- 6 OpenMP Extension to Loop Scheduling
- 6.1 Hierarchical Loop Scheduling
- 6.2 Multi-dimensional Chunking
- 6.3 Example
- 7 Static Workstealing and Particle Codes
- 7.1 Application of Static Workstealing
- 8 Conclusions and Future Work
- References
- Description, Implementation and Evaluation of an Affinity Clause for Task Directives
- 1 Introduction
- 2 Motivating Examples for Which Affinity Does Matter
- 3 Extending OpenMP to Support Affinities
- 3.1 Extension of the OpenMP Task Directive
- 3.2 Extension of the OpenMP Runtime API Functions
- 3.3 Extension of the Task Scheduler to Support Affinity
- 4 Examples of Use and Experimentation Results
- 4.1 Enhancing Task-Based OpenMP Kernels to Support affinity
- 4.2 Experimental Platform Description
- 4.3 Experimental Results
- 5 Related Work
- 6 Conclusion
- References
- Task Parallelism
- NUMA-Aware Task Performance Analysis
- 1 Introduction
- 2 Related Work
- 3 NUMA-Aware Task Creation
- 3.1 Task Scheduling in OpenMP
- 3.2 Task Creation
- 3.3 Benchmark Evaluation
- 4 Task Performance Analysis
- 4.1 Gathered Data
- 4.2 Data Analysis
- 5 Evaluation
- 6 Conclusion
- References
- OpenMP Extension for Explicit Task Allocation on NUMA Architecture
- 1 Introduction
- 2 Related Work
- 3 OpenMP Extension for NUMA-Aware Task Allocation
- 3.1 Overview
- 3.2 Language Definition
- 3.3 Prototype Implementation Using GCC
- 4 KASTOR Kernel Optimization with node_bind
- 4.1 Jacobi Kernel
- 4.2 SparseLU Kernel
- 4.3 Strassen Kernel
- 5 Performance Evaluation
- 5.1 Result of Jacobi Kernel
- 5.2 Result of SparseLU Kernel
- 5.3 Result of Strassen Kernel
- 6 Conclusion
- References
- Approaches for Task Affinity in OpenMP
- 1 Introduction
- 2 Related Work
- 3 Design Choices for Task Affinity
- 4 Proposed Syntax and Semantics
- 5 Prototype Implementations
- 5.1 OpenMP Place/Thread Approach
- 5.2 Storage Location Approach
- 5.3 Taskgroup
- 5.4 Taskloop
- 6 Evaluation
- 6.1 Place/Thread
- 6.2 Storage Location
- 6.3 Taskgroup
- 6.4 Taskloop Construct
- 7 Conclusion
- References
- Towards Unifying OpenMP Under the Task-Parallel Paradigm
- 1 Introduction
- 2 Existing OpenMP Task-Loop Implementations
- 3 Improved Task-Loop Implementation and Load Balancing
- 3.1 Implementation
- 3.2 Iteration Tasks
- 4 Experimental Method
- 4.1 Benchmarks
- 5 Results
- 5.1 Scheduling Heuristics
- 5.2 Benchmark Performance
- 6 Related Work
- 7 Conclusions
- References
- A Case for Extending Task Dependencies
- 1 Introduction
- 2 Global Dependencies
- 3 Unstructured Tasks
- 4 Queue Dependencies
- 5 Related Work
- 6 Conclusions
- References
- OpenMP as a High-Level Specification Language for Parallelism
- 1 Motivation
- 2 HClib
- 3 Methods
- 3.1 OpenMP-to-X Compile-Time Mechanics
- 3.2 The HClib APIs Targeted by OpenMP-to-X
- 3.3 Mapping OpenMP to HClib
- 4 Experimental Evaluation
- 4.1 Variance of each Runtime
- 4.2 Overall Performance
- 5 Discussion
- 5.1 Insights Gained into HClib
- 5.2 Motivating Extensions to OpenMP
- 5.3 Other Targets for OpenMP-to-X
- 6 Conclusions and Future Work
- References
- Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures
- 1 Introduction
- 2 Related Work
- 3 About the FMM Case Study
- 4 Tasking Through Temporal and Spatial Blocking
- 5 Characterization and Evaluation
- 5.1 Characterizing Data Locality and Idleness
- 5.2 Performance Evaluation
- 6 Conclusion and Future Work
- References
- Extensions
- Reducing the Functionality Gap Between Auto-Vectorization and Explicit Vectorization
- Abstract
- 1 Introduction
- 2 Compress and Expand
- 2.1 Loop-Level Syntax
- 2.2 Block-Level Syntax
- 2.3 Semantics Discussion
- 2.4 Combination with omp declare simd
- 2.5 Unit Test Performance
- 3 Histogram
- 3.1 Loop-Level Syntax
- 3.2 Block-Level Syntax
- 3.3 Block-Level Syntax Scoping
- 3.4 Unit Test Performance
- 4 Conclusion
- References
- A Proposal to OpenMP for Addressing the CPU Oversubscription Challenge
- 1 Introduction
- 2 Use Cases and Challenges of Interoperability
- 2.1 Three Use Cases
- 2.2 Issues with No or Poor Interoperability
- 2.3 Limitation of Interoperability Support in the Standard
- 3 Extensions to Address the Oversubscription Challenge
- 3.1 Definition of the ACTIVE and PASSIVE Wait Policies
- 3.2 Proposed Runtime Routines
- 3.3 The Omp_get_num_threads_runtime Runtime Routine
- 3.4 The Omp_set_wait_policy and omp_get_wait_policy Runtime Routines
- 3.5 The Omp_quiesce Runtime Routine
- 3.6 The Omp_thread_create/exit/join Runtime Routines
- 4 Implementation and Evaluation
- 4.1 Evaluation
- 4.2 Performance with Regards to the Oversubscription Ratio
- 5 Related Work
- 6 Conclusions and Future Work
- References
- Tools
- Testing Infrastructure for OpenMP Debugging Interface Implementations
- 1 Introduction
- 2 Architecture
- 2.1 Debugging Interface
- 2.2 OpenMP Debugging Interface
- 2.3 OpenMP Runtime Functions
- 3 Implementation
- 3.1 Comparing Properties
- 3.2 Triggers for Checks
- 4 OMPD Callback Library for Dyninst
- 4.1 Interface of LibOmpdCallback
- 4.2 Using the LibOmpdCallback for Stackwalker
- 5 Issues Detected in the OMPD Library Implementation
- 6 Applicability for Other Debugging Interfaces
- 7 Conclusions
- References
- The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT
- 1 Introduction
- 2 Background
- 3 Implementation
- 3.1 Runtime Support for OMPT
- 3.2 Instrumentation Support for OMPT
- 4 Experimental Setup
- 5 Results
- 5.1 Matrix Multiplication
- 5.2 Cholesky Decomposition
- 6 Related Work
- 7 Conclusions
- References
- Language-Centric Performance Analysis of OpenMP Programs with Aftermath
- 1 Introduction
- 2 Aftermath: Trace Generation and Analysis
- 2.1 Trace Generation
- 2.2 A Graphical Interface for Trace Analysis
- 3 Use Case: Optimization of MG
- 3.1 Identifying Execution Phases
- 3.2 Identifying Load Imbalance Resulting from NUMA
- 3.3 Identifying Parallelism Degree Limitations and Imbalance
- 4 Overhead of Tracing
- 5 Related Work
- 6 Conclusion and Future Work
- References
- Accelerator Programming
- Pragmatic Performance Portability with OpenMP 4.x
- 1 Introduction
- 1.1 Scope
- 2 Background
- 3 Implementation-Specific Interpretations
- 3.1 Thread Co-ordination
- 3.2 Cray Compiler Mapping of OpenMP onto NVIDIA GPUs
- 3.3 Clang Mapping of OpenMP onto NVIDIA GPUs
- 3.4 GCC 6.1 Mapping of OpenMP onto AMD GPUs with HSA
- 3.5 Intel Mapping of OpenMP
- 4 Performance Analysis
- 4.1 Individual Performance
- 4.2 Directives for Performance
- 5 Approaching Pragmatic Performance Portability
- 5.1 Homogenising the Directives
- 5.2 Patterns that Can Inhibit Performance Portability
- 5.3 Concluding Suggestions for Performance Portability
- 6 Related Work
- 7 Future Work
- 8 Conclusions
- References
- Multiple Target Task Sharing Support for the OpenMP Accelerator Model
- 1 Introduction
- 2 Accelerator Support in Directive-Based Approaches
- 2.1 Heterogeneity Support in OpenMP Accelerator Model
- 2.2 Heterogeneity Support in OpenACC
- 2.3 Heterogeneity Support in OmpSs
- 3 Proposal and Implementation of Multi Target Approach
- 3.1 Target Directive Syntax Extension
- 3.2 Compiler and Runtime Support to the Proposed Extensions
- 3.3 Compiler and Runtime Support for the resources Clause
- 4 Evaluation
- 4.1 System Configurations
- 4.2 OmpSs Runtime Configurations and Thread Binding
- 4.3 Performance Results
- 5 Conclusions and Future Work
- References
- Early Experiences Porting Three Applications to OpenMP 4.5
- 1 Introduction
- 2 Kripke
- 2.1 Challenges When Using Abstractions
- 3 Cardioid
- 3.1 High Performance and C++ Code Challenges
- 4 LULESH
- 4.1 Porting Challenges
- 4.2 Portable Performance Possible with Continued Compiler Work
- 5 Suggestions to Address Some Challenges
- 5.1 Clarify OpenMP's Relationship to Key C and C++ Constructs
- 5.2 Clarify Virtual Method Support
- 5.3 Add Deep Copy/Complex Data Structure Support
- 6 Related Work
- 7 Conclusions and Future Work
- References
- Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC
- 1 Introduction
- 2 Related Work
- 3 PEZY-SC
- 3.1 Architecture
- 3.2 Programming
- 4 Omni OpenACC Compiler
- 4.1 Design
- 4.2 Implementation
- 5 Evaluation
- 5.1 Benchmark
- 5.2 Performance
- 5.3 Productivity
- 6 Discussion
- 6.1 Optimization for PEZY-SC
- 6.2 Comparison with OpenMP
- 7 Conclusion
- References
- Performance Evaluations and Optimization
- Evaluating OpenMP Implementations for Java Using PolyBench
- 1 Introduction
- 2 Related Work
- 2.1 OpenMP for Java
- 2.2 Benchmarks
- 3 PolyBench for Java OpenMP
- 4 Evaluation
- 5 Results
- 5.1 Long Runtimes
- 5.2 Medium Runtimes
- 5.3 Short Runtimes
- 5.4 Other Observations
- 6 Conclusions
- References
- Transactional Memory for Algebraic Multigrid Smoothers
- 1 Introduction
- 2 Transactional Memory State-of-the-Art
- 3 Applying Transactional Memory to the AMG Smoother
- 3.1 Brief Review of Algebraic Multigrid Methods
- 3.2 TM-Assisted Error-Smoothing in AMG
- 4 Experimental Results
- 4.1 Problem Descriptions
- 4.2 Convergence
- 4.3 Transactional Memory Statistics
- 4.4 Timed Performance
- 5 Concluding Remarks
- References
- Supporting Adaptive Privatization Techniques for Irregular Array Reductions in Task-Parallel Programming Models
- 1 Introduction
- 2 Generalization Through AMLs
- 3 Runtime Support
- 3.1 Handling AMLs by the Compiler
- 3.2 Handling AMLs by the Runtime
- 3.3 Handling Inspector-Executors
- 4 Language Support
- 5 Case Study
- 5.1 The Choice of AML
- 5.2 Performance Results
- 6 Related Work
- 7 Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.