Languages and Compilers for Parallel Computing

Name: Languages and Compilers for Parallel Computing | 28th International Workshop, LCPC 2015, Raleigh, NC, USA, September 9-11, 2015, Revised Selected Papers
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

28th International Workshop, LCPC 2015, Raleigh, NC, USA, September 9-11, 2015, Revised Selected Papers

Xipeng Shen Frank Mueller James Tuck(Editor)

Springer (Publisher)

Published on 19. February 2016

X, 319 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-29778-1 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Contents
Programming Models
Size Oblivious Programming with InfiniMem
1 Introduction
2 Size Oblivious Programming
3 The InfiniMem Programming Interface
4 InfiniMem's I/O Efficient Object Representation
5 Evaluation
5.1 Programmability
5.2 Performance
5.3 Scalability
5.4 Integration with Distributed Shared Memory (DSM)
6 Related Work
7 Conclusion
References
Low-Overhead Fault-Tolerance Support Using DISC Programming Model
1 Introduction
2 Related Work
3 DISC Programming Model
3.1 Domain and Subdomain
3.2 Attributes
3.3 Compute-Function and Computation-Space
3.4 Interaction Between Domain Elements
4 Fault-Tolerance Support
4.1 Checkpointing
4.2 Replication
5 Experiments
5.1 Checkpointing
5.2 Replication
6 Conclusion
References
Efficient Support for Range Queries and Range Updates Using Contention Adapting Search Trees
1 Introduction
2 Related Work
3 Contention Adapting Search Trees
4 Experiments
5 Concluding Remarks
References
Optimizing Framework
Polyhedral Optimizations for a Data-Flow Graph Language
1 Introduction
2 Background
2.1 DFGL Model
2.2 Polyhedral Compilation Framework
3 Motivating Example
4 Converting DFGL to Polyhedral Representation
4.1 Embedded DFGL Programming Flow
4.2 DFGL Restrictions for Enabling Polyhedral Optimizations
5 Polyhedral Optimizations for DFGL
5.1 Polyhedral Representation of DFGL Program
5.2 Legality Analysis
5.3 Transformations
6 Experimental Results
7 Related Work
8 Conclusions
References
Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
1 Introduction
2 Background and Motivation
2.1 Blocking Deep in a Parallel Application
3 Programming Model
4 Another High-Level Interface: I/O Library
5 Low-Level Implementation and Scheduler
5.1 Adding the Concurrent Cilk Extensions
5.2 Scheduler Modifications
5.3 Optimized Pause/Resume Interface
6 Evaluation
6.1 Overhead of Concurrent Cilk Modifications
6.2 Scheduling Microbenchmarks
6.3 ``Sync elision'' and Exposing Parallelism
6.4 Servers with Per-Client Parallel Compute
7 Related Work
8 Conclusions and Future Work
References
Interactive Composition of Compiler Optimizations
1 Introduction
2 The Graphical User Interface
3 The POET Transformation Engine
4 Optimization Synthesis
4.1 Configuration Tables
4.2 The Algorithm
5 Experimental Evaluation
6 Related Work
7 Conclusions and Future Work
References
Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory
1 Introduction
2 Related Work
3 stapl Overview
4 Asynchronous Nested Parallelism in stapl
4.1 stapl Design Considerations
4.2 Execution Model
4.3 One Sided Gang Creation
5 Experimental Evaluation
5.1 Minimum Element on Composed Containers
5.2 Graph Algorithms
6 Conclusion
References
Parallelizing Compiler
Multigrain Parallelization for Model-Based Design Applications Using the OSCAR Compiler
1 Introduction
2 Framework for Parallelization of Model-based Design Applications
3 Exploiting Parallelism Using the OSCAR Compiler
3.1 Example of MATLAB/Simulink Application
3.2 Coarse Grain Task Parallel Processing
4 Multigrain Parallel Processing Method for MATLAB/Simulink Applications
4.1 Automatic Profiling in Model-based Development
4.2 Inline Expansion
4.3 Macro Task Fusion
4.4 Converting Loop Level Parallelism into Task Level Parallelism
5 Performance Evaluation of MATLAB/Simulink Applications on Multi-cores
5.1 Target MATLAB/Simulink Applications
5.2 Evaluation Environment
5.3 Performance Evaluation on Multi-cores
6 Conclusions
References
HYDRA : Extending Shared Address Programming for Accelerator Clusters
1 Introduction
2 Background
2.1 OMPD Baseline System
2.2 Array Data Flow Analysis
3 Extending Shared Address Programming Beyond CPU Clusters
3.1 HYDRA Programming Model
3.2 Compiler Analyses for Accelerator Data Management
4 Translation System Implementation
4.1 Supporting Multiple Accelerator Architectures
4.2 HYDRA Translation Process
4.3 HYDRA Runtime System
5 Evaluation
5.1 Experimental Setup
5.2 Scalability
5.3 Memory Allocation
6 Related Work
7 Conclusion
References
Petal Tool for Analyzing and Transforming Legacy MPI Applications
1 Introduction
2 Background
2.1 MPI Primitives
2.2 The ROSE Compiler Infrastructure
3 Implementation
3.1 Design
3.2 Blocking to Non-blocking Transformation
3.3 Non-persistent to Persistent Transformation
3.4 Discussion
4 Evaluation
4.1 Discussion of Results
5 Related Work
6 Conclusions and Future Work
References
Communication and Locality
Automatic and Efficient Data Host-Device Communication for Many-Core Coprocessors
1 Introduction
2 Motivation and Problem Definition
2.1 Challenges in CPU-to-coprocessor Data Transfers
3 Background: Complete Linearization
3.1 Stride-Bucket Optimization
4 Compile-Time Automation of Data Transfers
4.1 Partial Linearization with Pointer Reset
4.2 Interaction with Compiler Optimizations
5 Evaluation
5.1 Implementation
5.2 Experimental Methodology
5.3 Results and Analysis
6 Related Work
7 Conclusions
References
Topology-Aware Parallelism for NUMA Copying Collectors
1 Introduction
2 Motivation
3 Topology-Aware Copying Collector
3.1 Data Structures
3.2 Algorithm
3.3 Optimization Schemes
4 Experimental Setup
4.1 System Configuration
4.2 Benchmarks
4.3 Evaluation Metrics
5 Evaluation
5.1 NUMA Locality Trace
5.2 Pause Time and VM Time Analysis
5.3 Scalability
6 Related Work
7 Conclusion
References
An Embedded DSL for High Performance Declarative Communication with Correctness Guarantees in C++
1 Introduction
2 Kanor Syntax
3 Kanor Semantics and Properties
3.1 Semantics
3.2 Properties
4 Optimizing Communication
4.1 Communication Knowledge
4.2 Communication Invariance
5 Implementation Status
6 Experiments
7 Related Work
8 Conclusion and Future Work
References
Parallel Applications and Data Structures
PNNU: Parallel Nearest-Neighbor Units for Learned Dictionaries
1 Introduction
2 Background: Learned Dictionaries and Spare Coding
3 Parallel Nearest Neighbor Unit (PNNU)
3.1 Technique T1 (NNU): Identification of Candidates for Reducing Dot-Product Computations
3.2 Technique T2: Dimension Reduction for Minimizing the Cost of Each Dot-Product Computation
3.3 Technique T3: Parallel Processing with Low Inter-core Communication Overheads
4 Probabilistic Analysis of PNNU
5 Experimental Results of PNNU on Three Applications
5.1 Application A1: Action Recognition
5.2 Matching Pursuit Algorithm with PNNU
5.3 Application A2: Object Classification
5.4 Application A3: Image Denoising
6 Conclusion
References
Coarse Grain Task Parallelization of Earthquake Simulator GMS Using OSCAR Compiler on Various Cc-NUMA Servers
1 Introduction
2 The Ground Motion Simulator GMS
3 Coarse Grain Task Parallelization of the GMS
3.1 Coarse Grain Task Parallelization
3.2 Modification of the GMS
3.3 Data Distribution to Distributed Shared Memories Using First Touch
3.4 Task Scheduling on Cc-NUMA
3.5 Locality Optimization of Boundary Calculations in FDM
3.6 Generated Compiler Friendly Sequential Program and its Parallel Compilation
4 Performance of the Parallelized GMS
4.1 Evaluation Environments
4.2 Comparison of Commercial Compilers and the Proposed Method
4.3 Performance on the Five Different Cc-NUMA Servers
4.4 Evaluations with Various Data Sizes
5 Conclusions
References
Conc-Trees for Functional and Parallel Programming
1 Introduction
2 Conc-Tree List
3 Conc-Tree Rope
4 Mutable Conc-Trees
5 Evaluation
6 Related Work
7 Conclusion
References
Correctness and Reliability
Practical Floating-Point Divergence Detection
1 Introduction
2 Overview of Our Approach
3 Methodology
3.1 Abstract Binary Search (ABS)
3.2 Guided Random Testing (GRT)
4 Experimental Results
4.1 ABS Benchmarks
4.2 GRT Benchmarks
4.3 ABS Results
4.4 GRT Results
4.5 Random Testing
5 Related Work
6 Concluding Remarks
References
SMT Solving for the Theory of Ordering Constraints
1 Introduction
2 Motivation
3 Preliminaries
4 The Decision Procedure for Ordering Constraints
5 Integrating DPOC into DPLL(T)
5.1 The DPLL(T) Framework
5.2 Theory-Level Lemma Learning
6 Experimental Evaluation
7 Related Work
8 Conclusion
References
An Efficient, Portable and Generic Library for Successive Cancellation Decoding of Polar Codes
1 Introduction
2 Successive Cancellation Decoding of Polar Codes
2.1 Code Optimization Space
3 The P-EDGE Framework
3.1 The P-EDGE Decoder Generator
4 Low Level Building Blocks
5 Related Works
6 Evaluation
6.1 Comparison Between P-EDGE and the State of the Art
6.2 Exploring Respective Optimization Impacts with P-EDGE
7 Conclusion and Future Works
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Languages and Compilers for Parallel Computing

Description

More details

Other editions

Additional editions

Content

System requirements