
Languages and Compilers for Parallel Computing
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title
- Preface
- Organization
- Table of Contents
- McFLAT: A Profile-Based Framework for MATLAB Loop Analysis and Transformations
- Introduction
- Overview of Our Approach
- Important Components of McFLAT
- Instrumenter
- Range Estimator
- Dependence Analysis
- Loop Transformations
- Parallelism Detection
- Current Limitations of McFlat
- Experimental Results
- Benchmarks and Static Information
- Performance Study for Standard Loop Transformations
- Performance Study for Parallel For Loops
- Related Work
- Automatic Parallelization
- Adaptive Compilation
- Conclusions and Future Work
- References
- Static Analysis of Dynamic Schedules and Its Application to Optimization of Parallel Programs
- Introduction
- A Program Model with Explicit Scheduling
- Example of a Parallel Ordered Mapping Operation
- Additional Synchronization Primitives
- Schedule Analysis
- Datalog
- Pre-processing and Points-to Analysis
- Computing and Flattening the Abstract Schedule
- Computing Read- and Write-Sets
- Optimizations Based on Schedule Analysis
- Synchronization Removal
- Reducing Strong Atomicity Overhead
- Dependence Reduction
- Implementation and Future Work
- Related Work
- Concluding Remarks
- References
- Lowering STM Overhead with Static Analysis
- Introduction
- Background - Deuce, a Java-Based STM
- Optimization Opportunities
- Preventing Redundant Memory Accesses
- Preventing Redundant Writeset Operations
- Experimental Results
- Optimization Levels
- Benchmarks
- Optimization Opportunities Breakdown
- Analysis
- Related Work
- Conclusions and Further Work
- References
- A Parallel Numerical Solver Using Hierarchically Tiled Arrays
- Introduction
- SPIKE
- SPIKE Variants
- Hierarchically Tiled Arrays
- Implementing SPIKE with HTAs
- TU
- TA
- Experimental Results
- TU
- TA
- Related Work
- Conclusions
- References
- Tackling Cache-Line Stealing Effects Using Run-Time Adaptation
- Introduction
- Motivation
- What Cache-Line Stealing Is, and When It Occurs
- Experimental Setup
- Experimental Analysis of Cache-Line Stealing
- Possible Methods to Counter Cache-Line Stealing
- Run-Time Adaptation to Solve Cache-Line Stealing Using a Hybrid Software/Hardware Framework
- Description of the Adaptive Applications Framework
- Implementing the Adaptive Framework
- Related Work
- Conclusion and Future Work
- References
- Locality Optimization of Stencil Applications Using Data Dependency Graphs
- Introduction
- Background
- Stencil Applications
- Cyclops-64 and Many-Core Architectures
- DRAM Limitations for Many-Core Architectures
- Tiling
- Problem Formulation
- Optimal Tiling for Stencil Computations
- Implementation
- Experiments
- Results
- Conclusions
- Future Work
- References
- Array Regrouping on CMP with Non-uniform Cache Sharing
- Introduction
- Array Regrouping for Multithreading Applications on CMP
- Review of Basic Frequency-Based Affinity Analysis
- Cache-Sharing-Aware Reference Affinity Analysis
- Array Regrouping
- Evaluation
- Affinity-Guided Scheduling for Streamcluster
- Spatial Locality Enhancement for Summation
- Cache Conflict Reduction for Swim
- Related Work
- Conclusion
- References
- Sublimation: Expanding Data Structures to Enable Data Instance Specific Optimizations
- Introduction
- Sublimation
- Data Access Restructuring
- Identifying Injective Functions in Code
- Eliminating Indirect Addressing in the Loop Body
- Expanding the Iteration Space
- Restructuring in the Application Context
- Application of Sublimation to Pointer-Based Matrix Kernels
- Sparse Matrix Vector Multiply
- Jacobi Iteration
- Direct Solver
- Experiments
- Results on Sparse Matrix Kernels
- Overhead
- Conclusions
- References
- Optimizing and Auto-tuning Belief Propagation on the GPU
- Introduction
- Optimization Overview
- CUDA Belief Propagation
- Experimental Methodology
- Optimization Results: Register and Shared Memory Implementations
- Hybrid Implementation: Multiple Memory Modes in a Single Implementation
- Hybrid Results Discussion
- Auto-tuning Implementation
- Experiments Using Different GPUs
- Splitting Up the Image
- Related Work
- Conclusions and Future Work
- References
- A Programming Language Interface to Describe Transformations and Code Generation
- Introduction
- Compiler Structure and Motivation
- Requirements for Translating Sequential Loop Nests to CUDA
- Foundation from CHiLL Loop Transformation Recipes
- Using a Lua Programming Language Interface in CUDA-CHiLL
- Computation Decomposition of a Loop Nest: A Complex Transformation Sequence
- CUDA Code Generation
- Performance Results
- Related Work
- Summary and Future Work
- References
- Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation
- Introduction
- Extending UPC with Hierarchical Parallelism
- UPC's Execution Model on GPGPU Clusters
- Hierarchical Data Distribution
- Loop Partitioning to Implicit Thread Tree
- Implementation on GPU Clusters
- Overview of the Compiling System
- Affinity-Aware Loop Tiling Transformation
- Memory Optimizations for CUDA
- Unified Data Management
- Experimental Results
- Programmability Evaluation
- Performance Evaluation
- Related Work
- Conclusion and Future Work
- References
- How Many Threads to Spawn during Program Multithreading?
- Introduction
- Terminology and Background
- The ``What" and ``How
- Greedy Schedule
- Scheduling with Delayed waits
- OPT-Driven Scheduling
- T-OPT Algorithm
- Remark
- Case Studies and Results
- Previous Work
- Compaction-Based Parallelization
- Multithreaded Performance
- Conclusion
- References
- Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores
- Introduction
- OSCAR API Applicable Heterogeneous Multicore Architecture and Overview of the Compilation Flow
- OSCAR API Applicable Heterogeneous Multicore Architecture
- Compilation Flow
- A Compiler Framework for Heterogeneous Multicores
- Hint Directives for OSCAR Compiler
- OSCAR Parallelizing Compiler
- The Extension of OSCAR API for Heterogeneous Multicores
- Performance Evaluations on RP-X
- Evaluation Environment
- Performance by OSCAR Compiler with Accelerator Compiler
- Performance by OSCAR Compiler and Hand-tuned Library
- Evaluation of Power Consumption
- Conclusions
- References
- Debugging Large Scale Applications in a Virtualized Environment
- Introduction
- CharmDebug
- BigSim Emulator
- Debugging Charm++ Applications on BigSim
- Communicating with Virtual Processors
- Suspending Virtual Processors
- Debugging MPI Applications on BigSim
- Debugging Overhead in the Virtualized Environment
- Case Study
- Related Work
- Conclusions and Future Work
- References
- Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL
- Introduction
- Proposal
- Brief Description of the Programming Model
- Matrix Multiply
- BlackScholes
- Perlin Noise
- Julia Set
- Evaluation
- Execution Environments
- On an Intel Xeon Server
- On the Cell/B.E. Processor
- On NVIDIA GPUs
- Related Work
- Conclusions and Future Work
- References
- CnC-CUDA: Declarative Programming for GPUs
- Introduction
- Background
- Concurrent Collections Programming (CnC) Model
- Habanero-Java Implementation of CnC
- GPU Architecture and the CUDA Programming Model
- Programming Interface and Implementation
- Graph File
- Item Collections
- Tag Collection - PutRegion and GetRegion
- CUDA Kernel
- Implementation Details
- Preliminary Experimental Results
- Experimental Setup
- Evaluation and Analysis
- Related Work
- Conclusions and Future Work
- References
- Parallel Graph Partitioning on Multicore Architectures
- Introduction
- Graph Partitioning Algorithms
- Metis
- ParMetis
- Amorphous Data-Parallelism
- The Galois System
- GMetis
- Optimizations
- Evaluation
- Methodology
- Results
- Conclusion
- References
- The STAPL pView
- Introduction
- STAPL Overview
- Related Work
- STAPL pView Concept
- Useful Views
- The pView Class
- Results: Expressivity, Genericity, and Performance
- Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.