Languages and Compilers for Parallel Computing

Name: Languages and Compilers for Parallel Computing | 23rd International Workshop, LCPC 2010, Houston, TX, USA, October 7-9, 2010. Revised Selected Papers
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

23rd International Workshop, LCPC 2010, Houston, TX, USA, October 7-9, 2010. Revised Selected Papers

Keith Cooper John Mellor-Crummey Vivek Sarkar(Editor)

Springer (Publisher)

Published on 24. February 2011

X, 278 pages

E-Book

PDF with digital watermarking

System requirements

978-3-642-19595-2 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Title
Preface
Organization
Table of Contents
McFLAT: A Profile-Based Framework for MATLAB Loop Analysis and Transformations
Introduction
Overview of Our Approach
Important Components of McFLAT
Instrumenter
Range Estimator
Dependence Analysis
Loop Transformations
Parallelism Detection
Current Limitations of McFlat
Experimental Results
Benchmarks and Static Information
Performance Study for Standard Loop Transformations
Performance Study for Parallel For Loops
Related Work
Automatic Parallelization
Adaptive Compilation
Conclusions and Future Work
References
Static Analysis of Dynamic Schedules and Its Application to Optimization of Parallel Programs
Introduction
A Program Model with Explicit Scheduling
Example of a Parallel Ordered Mapping Operation
Additional Synchronization Primitives
Schedule Analysis
Datalog
Pre-processing and Points-to Analysis
Computing and Flattening the Abstract Schedule
Computing Read- and Write-Sets
Optimizations Based on Schedule Analysis
Synchronization Removal
Reducing Strong Atomicity Overhead
Dependence Reduction
Implementation and Future Work
Related Work
Concluding Remarks
References
Lowering STM Overhead with Static Analysis
Introduction
Background - Deuce, a Java-Based STM
Optimization Opportunities
Preventing Redundant Memory Accesses
Preventing Redundant Writeset Operations
Experimental Results
Optimization Levels
Benchmarks
Optimization Opportunities Breakdown
Analysis
Related Work
Conclusions and Further Work
References
A Parallel Numerical Solver Using Hierarchically Tiled Arrays
Introduction
SPIKE
SPIKE Variants
Hierarchically Tiled Arrays
Implementing SPIKE with HTAs
TU
TA
Experimental Results
TU
TA
Related Work
Conclusions
References
Tackling Cache-Line Stealing Effects Using Run-Time Adaptation
Introduction
Motivation
What Cache-Line Stealing Is, and When It Occurs
Experimental Setup
Experimental Analysis of Cache-Line Stealing
Possible Methods to Counter Cache-Line Stealing
Run-Time Adaptation to Solve Cache-Line Stealing Using a Hybrid Software/Hardware Framework
Description of the Adaptive Applications Framework
Implementing the Adaptive Framework
Related Work
Conclusion and Future Work
References
Locality Optimization of Stencil Applications Using Data Dependency Graphs
Introduction
Background
Stencil Applications
Cyclops-64 and Many-Core Architectures
DRAM Limitations for Many-Core Architectures
Tiling
Problem Formulation
Optimal Tiling for Stencil Computations
Implementation
Experiments
Results
Conclusions
Future Work
References
Array Regrouping on CMP with Non-uniform Cache Sharing
Introduction
Array Regrouping for Multithreading Applications on CMP
Review of Basic Frequency-Based Affinity Analysis
Cache-Sharing-Aware Reference Affinity Analysis
Array Regrouping
Evaluation
Affinity-Guided Scheduling for Streamcluster
Spatial Locality Enhancement for Summation
Cache Conflict Reduction for Swim
Related Work
Conclusion
References
Sublimation: Expanding Data Structures to Enable Data Instance Specific Optimizations
Introduction
Sublimation
Data Access Restructuring
Identifying Injective Functions in Code
Eliminating Indirect Addressing in the Loop Body
Expanding the Iteration Space
Restructuring in the Application Context
Application of Sublimation to Pointer-Based Matrix Kernels
Sparse Matrix Vector Multiply
Jacobi Iteration
Direct Solver
Experiments
Results on Sparse Matrix Kernels
Overhead
Conclusions
References
Optimizing and Auto-tuning Belief Propagation on the GPU
Introduction
Optimization Overview
CUDA Belief Propagation
Experimental Methodology
Optimization Results: Register and Shared Memory Implementations
Hybrid Implementation: Multiple Memory Modes in a Single Implementation
Hybrid Results Discussion
Auto-tuning Implementation
Experiments Using Different GPUs
Splitting Up the Image
Related Work
Conclusions and Future Work
References
A Programming Language Interface to Describe Transformations and Code Generation
Introduction
Compiler Structure and Motivation
Requirements for Translating Sequential Loop Nests to CUDA
Foundation from CHiLL Loop Transformation Recipes
Using a Lua Programming Language Interface in CUDA-CHiLL
Computation Decomposition of a Loop Nest: A Complex Transformation Sequence
CUDA Code Generation
Performance Results
Related Work
Summary and Future Work
References
Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation
Introduction
Extending UPC with Hierarchical Parallelism
UPC's Execution Model on GPGPU Clusters
Hierarchical Data Distribution
Loop Partitioning to Implicit Thread Tree
Implementation on GPU Clusters
Overview of the Compiling System
Affinity-Aware Loop Tiling Transformation
Memory Optimizations for CUDA
Unified Data Management
Experimental Results
Programmability Evaluation
Performance Evaluation
Related Work
Conclusion and Future Work
References
How Many Threads to Spawn during Program Multithreading?
Introduction
Terminology and Background
The ``What" and ``How
Greedy Schedule
Scheduling with Delayed waits
OPT-Driven Scheduling
T-OPT Algorithm
Remark
Case Studies and Results
Previous Work
Compaction-Based Parallelization
Multithreaded Performance
Conclusion
References
Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores
Introduction
OSCAR API Applicable Heterogeneous Multicore Architecture and Overview of the Compilation Flow
OSCAR API Applicable Heterogeneous Multicore Architecture
Compilation Flow
A Compiler Framework for Heterogeneous Multicores
Hint Directives for OSCAR Compiler
OSCAR Parallelizing Compiler
The Extension of OSCAR API for Heterogeneous Multicores
Performance Evaluations on RP-X
Evaluation Environment
Performance by OSCAR Compiler with Accelerator Compiler
Performance by OSCAR Compiler and Hand-tuned Library
Evaluation of Power Consumption
Conclusions
References
Debugging Large Scale Applications in a Virtualized Environment
Introduction
CharmDebug
BigSim Emulator
Debugging Charm++ Applications on BigSim
Communicating with Virtual Processors
Suspending Virtual Processors
Debugging MPI Applications on BigSim
Debugging Overhead in the Virtualized Environment
Case Study
Related Work
Conclusions and Future Work
References
Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL
Introduction
Proposal
Brief Description of the Programming Model
Matrix Multiply
BlackScholes
Perlin Noise
Julia Set
Evaluation
Execution Environments
On an Intel Xeon Server
On the Cell/B.E. Processor
On NVIDIA GPUs
Related Work
Conclusions and Future Work
References
CnC-CUDA: Declarative Programming for GPUs
Introduction
Background
Concurrent Collections Programming (CnC) Model
Habanero-Java Implementation of CnC
GPU Architecture and the CUDA Programming Model
Programming Interface and Implementation
Graph File
Item Collections
Tag Collection - PutRegion and GetRegion
CUDA Kernel
Implementation Details
Preliminary Experimental Results
Experimental Setup
Evaluation and Analysis
Related Work
Conclusions and Future Work
References
Parallel Graph Partitioning on Multicore Architectures
Introduction
Graph Partitioning Algorithms
Metis
ParMetis
Amorphous Data-Parallelism
The Galois System
GMetis
Optimizations
Evaluation
Methodology
Results
Conclusion
References
The STAPL pView
Introduction
STAPL Overview
Related Work
STAPL pView Concept
Useful Views
The pView Class
Results: Expressivity, Genericity, and Performance
Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Languages and Compilers for Parallel Computing

Description

More details

Other editions

Additional editions

Content

System requirements