
Languages and Compilers for Parallel Computing
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This
book constitutes the thoroughly refereed post-conference proceedings of the 28th
International Workshop on Languages and Compilers for Parallel Computing, LCPC
2015, held in Raleigh, NC, USA, in September 2015.
The
19 revised full papers were carefully reviewed and selected from 44 submissions.
The papers are organized in topical sections on programming models, optimizing
framework, parallelizing compiler, communication and locality, parallel
applications and data structures, and correctness and reliability.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Programming Models
- Size Oblivious Programming with InfiniMem
- 1 Introduction
- 2 Size Oblivious Programming
- 3 The InfiniMem Programming Interface
- 4 InfiniMem's I/O Efficient Object Representation
- 5 Evaluation
- 5.1 Programmability
- 5.2 Performance
- 5.3 Scalability
- 5.4 Integration with Distributed Shared Memory (DSM)
- 6 Related Work
- 7 Conclusion
- References
- Low-Overhead Fault-Tolerance Support Using DISC Programming Model
- 1 Introduction
- 2 Related Work
- 3 DISC Programming Model
- 3.1 Domain and Subdomain
- 3.2 Attributes
- 3.3 Compute-Function and Computation-Space
- 3.4 Interaction Between Domain Elements
- 4 Fault-Tolerance Support
- 4.1 Checkpointing
- 4.2 Replication
- 5 Experiments
- 5.1 Checkpointing
- 5.2 Replication
- 6 Conclusion
- References
- Efficient Support for Range Queries and Range Updates Using Contention Adapting Search Trees
- 1 Introduction
- 2 Related Work
- 3 Contention Adapting Search Trees
- 4 Experiments
- 5 Concluding Remarks
- References
- Optimizing Framework
- Polyhedral Optimizations for a Data-Flow Graph Language
- 1 Introduction
- 2 Background
- 2.1 DFGL Model
- 2.2 Polyhedral Compilation Framework
- 3 Motivating Example
- 4 Converting DFGL to Polyhedral Representation
- 4.1 Embedded DFGL Programming Flow
- 4.2 DFGL Restrictions for Enabling Polyhedral Optimizations
- 5 Polyhedral Optimizations for DFGL
- 5.1 Polyhedral Representation of DFGL Program
- 5.2 Legality Analysis
- 5.3 Transformations
- 6 Experimental Results
- 7 Related Work
- 8 Conclusions
- References
- Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
- 1 Introduction
- 2 Background and Motivation
- 2.1 Blocking Deep in a Parallel Application
- 3 Programming Model
- 4 Another High-Level Interface: I/O Library
- 5 Low-Level Implementation and Scheduler
- 5.1 Adding the Concurrent Cilk Extensions
- 5.2 Scheduler Modifications
- 5.3 Optimized Pause/Resume Interface
- 6 Evaluation
- 6.1 Overhead of Concurrent Cilk Modifications
- 6.2 Scheduling Microbenchmarks
- 6.3 ``Sync elision'' and Exposing Parallelism
- 6.4 Servers with Per-Client Parallel Compute
- 7 Related Work
- 8 Conclusions and Future Work
- References
- Interactive Composition of Compiler Optimizations
- 1 Introduction
- 2 The Graphical User Interface
- 3 The POET Transformation Engine
- 4 Optimization Synthesis
- 4.1 Configuration Tables
- 4.2 The Algorithm
- 5 Experimental Evaluation
- 6 Related Work
- 7 Conclusions and Future Work
- References
- Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory
- 1 Introduction
- 2 Related Work
- 3 stapl Overview
- 4 Asynchronous Nested Parallelism in stapl
- 4.1 stapl Design Considerations
- 4.2 Execution Model
- 4.3 One Sided Gang Creation
- 5 Experimental Evaluation
- 5.1 Minimum Element on Composed Containers
- 5.2 Graph Algorithms
- 6 Conclusion
- References
- Parallelizing Compiler
- Multigrain Parallelization for Model-Based Design Applications Using the OSCAR Compiler
- 1 Introduction
- 2 Framework for Parallelization of Model-based Design Applications
- 3 Exploiting Parallelism Using the OSCAR Compiler
- 3.1 Example of MATLAB/Simulink Application
- 3.2 Coarse Grain Task Parallel Processing
- 4 Multigrain Parallel Processing Method for MATLAB/Simulink Applications
- 4.1 Automatic Profiling in Model-based Development
- 4.2 Inline Expansion
- 4.3 Macro Task Fusion
- 4.4 Converting Loop Level Parallelism into Task Level Parallelism
- 5 Performance Evaluation of MATLAB/Simulink Applications on Multi-cores
- 5.1 Target MATLAB/Simulink Applications
- 5.2 Evaluation Environment
- 5.3 Performance Evaluation on Multi-cores
- 6 Conclusions
- References
- HYDRA : Extending Shared Address Programming for Accelerator Clusters
- 1 Introduction
- 2 Background
- 2.1 OMPD Baseline System
- 2.2 Array Data Flow Analysis
- 3 Extending Shared Address Programming Beyond CPU Clusters
- 3.1 HYDRA Programming Model
- 3.2 Compiler Analyses for Accelerator Data Management
- 4 Translation System Implementation
- 4.1 Supporting Multiple Accelerator Architectures
- 4.2 HYDRA Translation Process
- 4.3 HYDRA Runtime System
- 5 Evaluation
- 5.1 Experimental Setup
- 5.2 Scalability
- 5.3 Memory Allocation
- 6 Related Work
- 7 Conclusion
- References
- Petal Tool for Analyzing and Transforming Legacy MPI Applications
- 1 Introduction
- 2 Background
- 2.1 MPI Primitives
- 2.2 The ROSE Compiler Infrastructure
- 3 Implementation
- 3.1 Design
- 3.2 Blocking to Non-blocking Transformation
- 3.3 Non-persistent to Persistent Transformation
- 3.4 Discussion
- 4 Evaluation
- 4.1 Discussion of Results
- 5 Related Work
- 6 Conclusions and Future Work
- References
- Communication and Locality
- Automatic and Efficient Data Host-Device Communication for Many-Core Coprocessors
- 1 Introduction
- 2 Motivation and Problem Definition
- 2.1 Challenges in CPU-to-coprocessor Data Transfers
- 3 Background: Complete Linearization
- 3.1 Stride-Bucket Optimization
- 4 Compile-Time Automation of Data Transfers
- 4.1 Partial Linearization with Pointer Reset
- 4.2 Interaction with Compiler Optimizations
- 5 Evaluation
- 5.1 Implementation
- 5.2 Experimental Methodology
- 5.3 Results and Analysis
- 6 Related Work
- 7 Conclusions
- References
- Topology-Aware Parallelism for NUMA Copying Collectors
- 1 Introduction
- 2 Motivation
- 3 Topology-Aware Copying Collector
- 3.1 Data Structures
- 3.2 Algorithm
- 3.3 Optimization Schemes
- 4 Experimental Setup
- 4.1 System Configuration
- 4.2 Benchmarks
- 4.3 Evaluation Metrics
- 5 Evaluation
- 5.1 NUMA Locality Trace
- 5.2 Pause Time and VM Time Analysis
- 5.3 Scalability
- 6 Related Work
- 7 Conclusion
- References
- An Embedded DSL for High Performance Declarative Communication with Correctness Guarantees in C++
- 1 Introduction
- 2 Kanor Syntax
- 3 Kanor Semantics and Properties
- 3.1 Semantics
- 3.2 Properties
- 4 Optimizing Communication
- 4.1 Communication Knowledge
- 4.2 Communication Invariance
- 5 Implementation Status
- 6 Experiments
- 7 Related Work
- 8 Conclusion and Future Work
- References
- Parallel Applications and Data Structures
- PNNU: Parallel Nearest-Neighbor Units for Learned Dictionaries
- 1 Introduction
- 2 Background: Learned Dictionaries and Spare Coding
- 3 Parallel Nearest Neighbor Unit (PNNU)
- 3.1 Technique T1 (NNU): Identification of Candidates for Reducing Dot-Product Computations
- 3.2 Technique T2: Dimension Reduction for Minimizing the Cost of Each Dot-Product Computation
- 3.3 Technique T3: Parallel Processing with Low Inter-core Communication Overheads
- 4 Probabilistic Analysis of PNNU
- 5 Experimental Results of PNNU on Three Applications
- 5.1 Application A1: Action Recognition
- 5.2 Matching Pursuit Algorithm with PNNU
- 5.3 Application A2: Object Classification
- 5.4 Application A3: Image Denoising
- 6 Conclusion
- References
- Coarse Grain Task Parallelization of Earthquake Simulator GMS Using OSCAR Compiler on Various Cc-NUMA Servers
- 1 Introduction
- 2 The Ground Motion Simulator GMS
- 3 Coarse Grain Task Parallelization of the GMS
- 3.1 Coarse Grain Task Parallelization
- 3.2 Modification of the GMS
- 3.3 Data Distribution to Distributed Shared Memories Using First Touch
- 3.4 Task Scheduling on Cc-NUMA
- 3.5 Locality Optimization of Boundary Calculations in FDM
- 3.6 Generated Compiler Friendly Sequential Program and its Parallel Compilation
- 4 Performance of the Parallelized GMS
- 4.1 Evaluation Environments
- 4.2 Comparison of Commercial Compilers and the Proposed Method
- 4.3 Performance on the Five Different Cc-NUMA Servers
- 4.4 Evaluations with Various Data Sizes
- 5 Conclusions
- References
- Conc-Trees for Functional and Parallel Programming
- 1 Introduction
- 2 Conc-Tree List
- 3 Conc-Tree Rope
- 4 Mutable Conc-Trees
- 5 Evaluation
- 6 Related Work
- 7 Conclusion
- References
- Correctness and Reliability
- Practical Floating-Point Divergence Detection
- 1 Introduction
- 2 Overview of Our Approach
- 3 Methodology
- 3.1 Abstract Binary Search (ABS)
- 3.2 Guided Random Testing (GRT)
- 4 Experimental Results
- 4.1 ABS Benchmarks
- 4.2 GRT Benchmarks
- 4.3 ABS Results
- 4.4 GRT Results
- 4.5 Random Testing
- 5 Related Work
- 6 Concluding Remarks
- References
- SMT Solving for the Theory of Ordering Constraints
- 1 Introduction
- 2 Motivation
- 3 Preliminaries
- 4 The Decision Procedure for Ordering Constraints
- 5 Integrating DPOC into DPLL(T)
- 5.1 The DPLL(T) Framework
- 5.2 Theory-Level Lemma Learning
- 6 Experimental Evaluation
- 7 Related Work
- 8 Conclusion
- References
- An Efficient, Portable and Generic Library for Successive Cancellation Decoding of Polar Codes
- 1 Introduction
- 2 Successive Cancellation Decoding of Polar Codes
- 2.1 Code Optimization Space
- 3 The P-EDGE Framework
- 3.1 The P-EDGE Decoder Generator
- 4 Low Level Building Blocks
- 5 Related Works
- 6 Evaluation
- 6.1 Comparison Between P-EDGE and the State of the Art
- 6.2 Exploring Respective Optimization Impacts with P-EDGE
- 7 Conclusion and Future Works
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.