Euro-Par 2015: Parallel Processing Workshops

Name: Euro-Par 2015: Parallel Processing Workshops | Euro-Par 2015 International Workshops, Vienna, Austria, August 24-25, 2015, Revised Selected Papers
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

Euro-Par 2015 International Workshops, Vienna, Austria, August 24-25, 2015, Revised Selected Papers

Sascha Hunold Alexandru Costan Domingo Giménez Alexandru Iosup Laura Ricci María Engracia Gómez Requena Vittorio Scarano Ana Lucia Varbanescu Stephen L. Scott Stefan Lankes Josef Weidendorfer Michael Alexander(Editor)

Springer (Publisher)

Published on 17. December 2015

XLIII, 839 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-27308-2 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Intro
Preface
Organization
Workshop Introduction and Organization
4th Workshop on Big Data Management in Clouds (BigDataCloud)
First European Workshop on Parallel and Distributed Computing Education for Undergraduate Students (Euro-EDUPAR)
13th International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar)
Third Workshop on Large-Scale Distributed Virtual Environments (LSDVE)
4th International Workshop on On-Chip Memory Hierarchies and Interconnects (OMHI)
Third Workshop on Parallel and Distributed Agent-Based Simulations (PADABS)
First Workshop on Performance Engineering for Large-Scale Graph Analytics (PELGA)
Second International Workshop on Reproducibility in Parallel Computing (REPPAR)
8th Workshop on Resiliency in High-Performance Computing in Clusters, Clouds, and Grids (Resilience)
Third Workshop on Runtime and Operating Systems for the Many-Core Era (ROME)
8th Workshop on UnConventional High-Performance Computing 2015 (UCHPC)
10th Workshop on Virtualization in High-Performance Cloud Computing (VHPC)
Contents
BigDataCloud - Big Data Management in Clouds
Distributed Range-Based Meta-Data Management for an In-Memory Storage
1 Introduction
2 DXRAM Architecture
2.1 Chunks
2.2 Super-Peer Overlay
3 CID-Ranges
3.1 CID-Tree
3.2 Backup Nodes Integration
3.3 Client-Side Caching
4 Evaluation
4.1 CID-Tree
4.2 Client-Side Caching
4.3 BG Benchmark
5 Related Work
6 Conclusions
References
Network-Based Data Processing Architecture for Reliable and High-Performance Distributed Storage System
1 Introduction
1.1 Background
1.2 Our Contribution
2 Related Work
3 System Design
3.1 Network-Based Data Processing Architecture
3.2 Overview of the System
3.3 Data Layout
3.4 Switch Architeture
3.5 Fallback Mode
3.6 Prototype Implementation Overview
3.7 Optimized Data Transfer and Processing with RDMA
4 Evaluation
4.1 Evaluation Target and Conditions
4.2 Evaluation Results
5 Conclusion and Future Work
References
File-Less Approach to Large Scale Data Management
1 Introduction
2 Related Work
3 Filess Vision
4 Filess Data Model
4.1 Hypergraphs
4.2 Overview
4.3 Object Composition and Decomposition
5 Representing Existing Data Structures and Formats in Filess
6 Prototype Design and Implementation
7 Conclusions
References
Euro-EDUPAR - Parallel and Distributed Computing Education for Undergraduate Students
Parallel Computing vs. Distributed Computing: A Great Confusion? (Position Paper)
1 A (Very) Quick Look at Parallel Computing
2 What Is Distributed Computing
3 A Fundamental Difference Between Parallel Computing and Distributed Computing
4 On the Computational Side: The Hardness of Distributed Computing
5 Parallel vs. Distributed Computing: A Schematic View
6 An Approach to Teach Distributed Computing
7 Distributed Algorithms at the Undergraduate Level
8 Distributed Algorithms at the Graduate Level
9 When Communication Is Through a Shared Memory
10 When Communication Is by Message-Passing
11 Conclusion
A The Non-blocking Atomic Commit Problem
B Remark on the Notion of a Consensus Number of an Object
References
SAUCE: A Web-Based Automated Assessment Tool for Teaching Parallel Programming
1 Introduction
2 Related Work
3 Technical Aspects
3.1 Python
3.2 SAUCE Web Application
3.3 Learning Tools Interoperability
3.4 Security Considerations
3.5 Distributed Execution
4 Use Cases
4.1 Solving the Poisson Equation Using MPI
4.2 Odd-Even Sort Using OpenMP
4.3 Array Reversal Using CUDA
4.4 Grading Features
5 Conclusion
References
Teaching Parallel Programming in Interdisciplinary Studies
1 Introduction
2 Basic Concepts for Interdisciplinary Students
3 Parallel Programming
3.1 Shared Memory: OpenMP
3.2 Message Passing: MPI
3.3 GPUs: CUDA
3.4 Performance Analysis: Tools
4 Applied Modelling and Simulation
5 Conclusions
References
On-line Service for Teaching Parallel Programming
1 Introduction
2 Motivation
3 ZawodyWeb System
3.1 Overview
3.2 Technical Details
3.3 Functionality
4 UNICORE
5 ZawodyWeb Support for Parallel Computing
6 Supported Languages
6.1 OpenMP
6.2 MPI
6.3 PCJ
7 Results
7.1 Practical Evaluation
8 Conclusions
References
Challenges of a Systematic Approach to Parallel Computing and Supercomputing Education
1 Introduction
2 Supercomputing Education Infrastructure
3 Supercomputing Consortium of Russian Universities
4 Supercomputing Education National Project
5 Supercomputing Education in Russia's Universities Today
5.1 Supercomputing Education at Lomonosov Moscow State University
5.2 Supercomputing Education at the Lobachevsky Nizhny Novgorod State University
6 Supercomputer Technologies and School Education
7 Conclusion
References
Teaching Heart Modeling and Simulation on Parallel Computing Systems
1 Introduction
2 Related Work
3 The Course Track ``Heart Modeling and Simulation on Parallel Computing Systems''
3.1 General Course Track Description
3.2 Prerequisite Courses
3.3 Computational Resources
4 Parallel and Distributed Computing Module
4.1 Parallel and Distributed Computing
4.2 GPU Programming
4.3 Xeon Phi Programming
5 Numerical Methods Module
5.1 Parallel Numerical Methods
5.2 Science Hackathon
6 Heart Modeling Module
6.1 Simulation of Living Systems
6.2 Modeling Heart Dynamics on Parallel Computing Systems
7 Discussion
8 Conclusion
References
Integration of ICT in Concurrent and Parallel Programming Lectures
1 Introduction
1.1 Environment
1.2 Objectives
1.3 Time Schedule
2 What Has Been Innovated?
2.1 Development Methodology
3 Results
3.1 Pre-assessment
3.2 Post-assessment
4 Conclusions and Future Work
References
Teamwork Across Disciplines: High-Performance Computing Meets Engineering
1 Interdisciplinary Education and Teamwork
1.1 Introduction
1.2 Challenges
1.3 Outline
2 Course Curriculum
2.1 Teamwork Across Disciplines: Concept
2.2 Realization: Turbulent Flow Simulation on HPC-Systems
3 Evaluation
4 Conclusion
References
An Educational Module Illustrating How Sparse Matrix-Vector Multiplication on Parallel Processors Connects to Graph Partitioning
1 Introduction
2 A Simple Sparse Matrix Data Structure
3 Sparse Matrix-Vector Multiplication Goes Parallel
4 An Undirected Graph Model for Data Partitioning
5 An Educational Module Illustrating the Connection
6 Related Work
7 Concluding Remarks
References
FERBJMON Tools - Visualizing Thread Access on Java Objects using Lightweight Runtime Monitoring
1 Introduction
2 Related Work
3 Java Runtime Monitoring Using FERBJMON Tools
3.1 Bytecode Instrumentation
3.2 FerbJmon Call Graph
3.3 FERBJMON Timeline Diagram of Thread Accesses
4 Examples
4.1 Producer and Consumer
4.2 Cooperative Task Execution
5 Performance of FerbJmon Runtime Monitoring
6 Conclusion
References
Interdisciplinary Practical Course on Parallel Finite Element Method Using HiFlow3
1 Introduction
2 HiFlow3
3 Practical Course on Parallel Numerics
4 Summary and Future Work
References
HeteroPar - Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms
A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators
1 Introduction
2 Hybrid RBT Solver
3 RBT for Graphics Processing Units
3.1 Implementation
3.2 Performance Results
4 RBT for Intel Xeon Phi
4.1 Implementation
4.2 Performance Results
5 Conclusion
References
Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
1 Introduction
1.1 Motivation
1.2 Contributions
2 Background
3 Methodology
3.1 Static Analysis
3.2 Dynamic Analysis
3.3 Instruction Operation Metrics
4 Analysis
4.1 Applications
4.2 Methodology
4.3 Results
5 Related Work
6 Conclusion and Future Work
References
Modeling Contention and Mapping Effects in Multi-core Clusters
1 Introduction
2 Related Work
3 Modeling Parallel Algorithms
4 Case Study 1: Analyzing the Effect of the Contention in Shared Memory
5 Case Sudy 2: Modeling the Mapping Effects on Multi-core Clusters
6 Test Platforms
7 Conclusions
References
Towards Community Detection on Heterogeneous Platforms
1 Introduction
2 Background
2.1 The WCC Metric
2.2 The Scalable Community Detection Algorithm
3 Design and Implementation
3.1 The Massively Parallel Version
3.2 The Heterogeneous Version
3.3 Automatic Partitioning
4 Evaluation
4.1 The GPU Version
4.2 The Heterogenous Version
4.3 End-to-End Performance
5 Related Work
6 Conclusion and Future Work
References
A Design Proposal for a Next Generation Scientific Software Framework
1 Introduction
2 Requirements
3 Approach
3.1 Embedded Domain-specific-languages
3.2 Tiling
3.3 Task Based Runtime Support
3.4 Proposed Architecture
4 Example: Structured AMR
4.1 Granularities and Decomposition
4.2 Micro-parallelism
4.3 Solvers
5 Conclusions
References
Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures
1 Introduction
2 Challenges and Opportunities
2.1 Improving Performance with Hardware Accelerators
2.2 Improving Performance with Direction-Optimized BFS
3 The Hybrid Algorithm
3.1 Direction-Optimized BFS for Partitioned Graphs
3.2 Partition Specialization
3.3 Switching Processing Direction for a Partitioned Graph
3.4 Optimizations to Improve Access Locality
4 Experimental Results
4.1 The Impact of Specialized Partitioning
4.2 Comparison with Past Work Using Real-World Graphs
4.3 The Energy Case
5 Summary
References
FiNS: A Framework for Accelerating Nested Simulations on Heterogeneous Platforms
1 Introduction
2 Financial Background
3 GPU Background
3.1 CUDA Streams and Hyper-Q
3.2 Multi Processing Service
4 Framework Architecture
4.1 The Framework
4.2 Streams in Practice
5 Nested Simulation for ALM Tooling: A Case Study
5.1 Application Description
5.2 Using the Framework
5.3 Evaluation
6 Related Work
7 Conclusion and Future Work
References
Communication Models Insights Meet Simulations
1 Introduction
2 Related Work
3 Problem Description
4 Simulation Framework
5 Evaluation
5.1 Platform and Jobs Description
5.2 Competing Heuristics
5.3 Homogeneous Platform Experiment
5.4 Heterogeneous Platform Experiments
6 Conclusion
References
LSDVE - Large Scale Distributed Virtual Environments
Community Discovery for Interest Management in DVEs: A Case Study
1 Introduction
2 Related Work
3 Reference Architecture
3.1 Coverage Peer Sampling
4 Distributed Community Discovery Protocols
4.1 GROUP
4.2 Affinity Propagation
5 Experimental Evaluation
5.1 Hotspots Approximation
5.2 Area Coverage
5.3 Message Number and Computational Overhead
6 Conclusion
References
Continuation Complexity: A Callback Hell for Distributed Systems
1 Introduction
2 Continuation Complexity Problem
2.1 A Simple Example: Chord
3 Overcoming the Continuation Complexity Problem in Actor Models
3.1 Concurrency Control
4 PyActive Abstractions
4.1 A Complete Example: Chord
5 Evaluating the Expressiveness and Simplicity of Our Approach
6 Conclusions
References
Offloading Service Provisioning on Mobile Devices in Mobile Cloud Computing Environments
1 Introduction
2 Related Work
3 Hybrid Mobile Cloud Computing Solution for Service Provisioning
3.1 Resolution Process
3.2 Data Collection
3.3 Evaluation of Service Provisioning Alternatives
4 System Evaluation
4.1 Service Provisioning Time Comparison
4.2 Split of Service Executions in the Hybrid Approach
5 Conclusions
References
A Systematic Quality Analysis of Virtual Desktop Infrastructure Technologies
1 Introduction
2 Considered Virtual Desktop Infrastructures
2.1 Citrix XenDesktop
2.2 VMware Horizon View
2.3 Microsoft Virtual Desktop Infrastructure
3 Technology Implementation and Evaluation Setup
4 Benchmarking of the VDI Solutions
4.1 Setup: Activities Based on Identified User Types
4.2 Setup: Network Variations
4.3 Video Streaming Quality: Impact of Individual Network Factors
4.4 Streaming Quality of Solutions in Various Network Conditions
5 Conclusions
References
A Trustworthy Distributed Social Carpool Method
1 Introduction
2 Related Work
3 Carpooling Platform
4 The Reputation Algorithm
5 Security of the Scheme
6 The Android Application
7 Conclusions
References
OMHI - On-Chip Memory Hierarchies and Interconnects: Organization, Management and Implementation
Efficient DVFS Operation in NoCs Through a Proper Congestion Management Strategy
1 Introduction
2 Related Work
3 ICARO-DVFS Implementation
3.1 Dynamic Voltage and Frequency Scaling
3.2 Voltage and Frequency Islands
3.3 ICARO
3.4 Merging ICARO with DVFS
3.5 Different ICARO-DVFS Alternatives
3.6 ICARO-DVFS Performance Analysis
4 Conclusions and Future Work
References
Superoptimizing Memory Subsystems for Multiple Objectives
1 Introduction
2 Related Work
3 Method
4 Benchmarks
5 Results
5.1 Minimizing Writes
5.2 Multi-objective Superoptimization
6 Conclusions
References
PADABS - Parallel and Distributed Agent-Based Simulations
On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks
1 Introduction
1.1 Our Results
2 The Graph Partitioning Problem
3 Experiment Setting
4 Results
4.1 Analytical Results
4.2 Real Setting Results
4.3 Correlation Between Analytical and Real Setting Results
5 Conclusion
References
Distributed Agent-Based Simulation and GIS: An Experiment with the Dynamics of Social Norms
1 Introduction
2 Experiment
3 Results
3.1 Experiments Settings
3.2 Analytical Analysis of ABM and GIS
4 Conclusion
References
Behavioral Spherical Harmonics for Long-Range Agents' Interaction
1 Introduction
2 Background
3 Behavioral Spherical Harmonics
3.1 Spherical Harmonics
3.2 Projection and Reconstruction
3.3 Behavioral Spherical Harmonics
4 Implementation
4.1 Spatial Subdivision with Grid
4.2 Projection of Directionality into SH Coefficients
4.3 Reconstruction of the Avoidance Direction
5 Results
6 Conclusion
References
Graph-Based Automatic Dynamic Load Balancing for HPC Agent-Based Simulations
1 Introduction
2 3D Spatial Agent Organisation
3 Graph-Based Agent Partitioning
4 Automatic Dynamic Load Balancing
5 Experimental Results
6 Conclusion and Future Work
References
Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations
1 Introduction
2 Background and Related Work
2.1 Related Work
3 Design and Implementation of TraceR
3.1 Running TraceR in Optimistic Mode
4 Parameter Choices for TraceR
5 Experimental Setup and Configuration Parameters
5.1 Conservative Versus Optimistic Simulation
5.2 Effect of Batch Size and GVT Interval
5.3 Impact of Number of LPs per KP
6 Performance Comparison, Scaling and Validation
6.1 Comparison with Sequential Executions
6.2 Parallel Scaling and Validation of TraceR
7 Conclusion
References
Road Network Simulation Using FLAME GPU
1 Introduction
2 Related Work
3 Model
4 Implementation
4.1 Scalable Artificial Road Network
4.2 FLAME GPU
4.3 Implementing Gipps' Car Following Model Using FLAME GPU
4.4 Visualisation and Graphics Techniques
5 Experimental Results
5.1 Vehicle Scaling for Static Road Network
5.2 Scaling of Vehicle and Road Network
5.3 Visualisation
6 Conclusions
References
A Communication Schema for Parallel and Distributed Multi-agent Systems Based on MPI
1 Introduction
2 Related Work
3 Implementing PDMAS on HPC Platforms
4 Proposition
5 Communication Schema (Receive Message Phase)
6 Proxy System (Agents Update Phase)
7 Experimentation
8 Conclusion and Perspectives
References
Large-Scale Agent-Based Modeling with Repast HPC: A Case Study in Parallelizing an Agent-Based Model
Abstract
1 Introduction
2 The CA-MRSA Model
2.1 Model Structure
2.2 Implementation
3 From the CA-MRSA Model to the Social Interaction Model
3.1 OpenMP
3.2 Parallelizing the Model with MPI
3.2.1 Minimizing the Data Cost
3.2.2 Minimizing the Sending Frequency
4 Conclusion
Acknowledgements
References
RAMSES: Reversibility-Based Agent Modeling and Simulation Environment with Speculation-Support
1 Introduction
2 Related Work
3 RAMSES
3.1 Reference Programming Model
3.2 Exposed API
3.3 Tracking Memory Updates for Reversibility
3.4 Runtime Execution Support
4 Experimental Results
5 Conclusions
References
PELGA - Performance Engineering for Large-scale Graph Analytics
Can Embedding Solve Scalability Issues for Mixed-Data Graph Clustering?
Abstract
1 Introduction
2 State of the Art and Related Work
2.1 Graph Clustering Approaches
2.2 Mixed Data Type Clustering Approaches
2.3 K-Means Clustering Algorithm
2.4 Graph Embedding Systems
3 Graph Coordinates Approach: Embedding + Clustering
3.1 Embedding
3.2 Clustering
3.3 Complete Process
4 Scalability Testing
4.1 Experimental Setup
4.2 Results
4.3 Comparison with Other Methods
4.4 Interactivity
4.5 Accuracy
4.6 Discussion About the Parallelization of the Clustering and Embedding Processes
5 Conclusions and Future Work
References
Using the Marshall-Olkin Extended Zipf Distribution in Graph Generation
1 Introduction
2 The MOEZipf Model
3 Real Graphs Analysis
4 Generating MOEZipf Degree Samples
5 Scalable MOEZipf Generation with Datagen
6 Conclusions and Future Work
References
Highspeed Graph Processing Exploiting Main-Memory Column Stores
1 Introduction
2 Related Work
3 System Architecture
3.1 Columnar Graph Storage
3.2 Secondary Graph Index Structure
4 Implementation Details
4.1 Basic Graph Operations & Building Blocks
4.2 Query Implementation
4.3 Memory Consumption
5 Evaluation
5.1 NUMA Effects
5.2 Performance
6 Conclusion
References
A Multi-layer Framework for Graph Processing via Overlay Composition
1 Introduction
2 The Telos framework
2.1 Protocols
3 Evaluation
3.1 Torus Overlay
3.2 Scalability
3.3 Graph Partitioning
4 Related Work
5 Conclusions
References
Quantifying the Performance Impact of Graph Structure on Neighbour Iteration Strategies for PageRank
1 Introduction
2 Background
2.1 PageRank
2.2 The GPU Architecture
3 Design and Implementation
3.1 Four PageRank Versions
3.2 Estimating Performance
3.3 Parallel Performance
4 Experimental Evaluation
4.1 Experimental Setup
4.2 Results
4.3 Sorted Graphs
5 Related Work
6 Conclusion
References
Accelerating Minimum Spanning Forest Computations on Multicore Platforms
1 Introduction
2 Base MSF Algorithm and Target Platform
3 Update Edges for Locality
4 Stages
5 PRAM Simulation
6 Combining Stages and PRAM Simulation
7 Conclusion and Future Work
References
Importance of Runtime Considerations in Performance Engineering of Large-Scale Distributed Graph Algorithms
1 Introduction
2 Runtime Systems
2.1 More Details About AM++
3 Algorithms
3.1 Distributed Control
3.2 -Stepping
4 Application Performance Sensitivity to Runtime Features
5 Conclusions
References
Characterizing Communication Patterns of Parallel Programs Through Graph Visualization and Analysis
1 Introduction
2 Related Work
3 Complex Networks Basics
3.1 Graph Visualization and Analysis
4 Case Study: Characterizing NAS Parallel Benchmarks
4.1 Methodology
4.2 NAS Parallel Benchmark
4.3 Characterization Results
5 Discussion and Future Work
6 Conclusion
References
REPPAR - Reproducibility in Parallel Computing
Reproducible and User-Controlled Software Environments in HPC with Guix
1 Introduction
2 Rationale
3 Functional Package Management
4 Use Cases
4.1 Usage Patterns on an HPC Cluster
4.2 Customizing Packages
5 Limitations and Challenges
6 Related Work
7 Conclusion
References
Reproducibility in Practice: Lessons Learned from Research and Teaching Experiments
1 Introduction
2 Reproducibility Scenarios and Tool Support
3 Prova!: Performance Reproducibility of Various Applications!
4 Case Studies: Research and Teaching Experiments
4.1 Stencil Variants in Chapel
4.2 Non-blocking Collectives
4.3 Students' Work Replication
5 Conclusions and Future Work
References
Towards Complete Tracking of Provenance in Experimental Distributed Systems Research
1 Introduction
2 Provenance in Computer Science
2.1 General Provenance
2.2 Provenance in General Computing
2.3 Provenance in Scientific Workflows
2.4 Provenance in Control-Flows
2.5 Provenance in Experimental Distributed Systems Research
3 New Classification of Provenance
4 Design of a Provenance System
4.1 Provenance of Experiment Data
4.2 Provenance of Experiment Description
4.3 Provenance of Experiment Process
5 Conclusions and Future Work
References
Resilience - Resiliency in High Performance Computing with Clouds, Grids, and Clusters
A Case Study of Application Structure Aware Resilience Through Differentiated State Saving and Recovery
1 Introduction
2 The Libraries
3 Resilience Strategy
3.1 Saving Chombo State with GVR
3.2 Failure Scenarios and Recovery Modes
4 Tuning for a Specific Platform
5 Future Work
References
A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q
1 Introduction
2 Dataset Description
3 Data Analysis
3.1 Individual Datasets
3.2 The Big Picture
4 Related Work
5 Discussion and Conclusions
References
Addressing the Last Roadblock for Message Logging in HPC: Alleviating the Memory Requirement Using Dedicated Resources
1 Introduction
2 Background and Related Work
2.1 Message-Logging Protocols
2.2 Reducing the Log Size
3 Dedicated Logger Nodes
4 Evaluation
4.1 Implementation
4.2 Experimental Setup
4.3 Dumping Overhead with Different Number of Logger Nodes
4.4 Combining Hierarchical Protocols with Logger Nodes
4.5 Dumping Overhead with Different Memory Limits
4.6 Use Case
5 Conclusion
References
Towards Understanding Post-recovery Efficiency for Shrinking and Non-shrinking Recovery
1 Introduction
2 A Synthetic Bulk-Synchronous Application
3 Experiments
3.1 Experiment Setup
3.2 Performance vs. Communication Pattern and Replacement Node Distance
3.3 Performance vs. Communication Fraction
4 Discussion
5 Related Work
6 Summary and Future Work
References
Canaries in a Coal Mine: Using Application-Level Checkpoints to Detect Memory Failures
1 Introduction
2 Checkpointing on Current Systems
3 Approach: Using Checkpoints as Failure Detectors
4 Using Learning Approaches to Classify Checkpoints
4.1 Application Checkpoint Description
4.2 Modeling Checkpoint Data
4.3 Choosing an ML Technique
4.4 Unsupervised Learning: Clustering with K-Means
4.5 Supervised Learning with Decision Tree Methods
5 Related Work
6 Conclusion
References
ROME - Runtime and Operating Systems for the Many-Core Era
Energy Characterization and Optimization of Parallel Prefix-Sums Kernels
1 Introduction
2 Implementation
2.1 CPPS Algorithm
2.2 Sequential Prefix-Sums Kernels
2.3 Thread Placement Policies
3 Discussion
4 Experiments
5 Conclusion
References
An OS-Oriented Performance Monitoring Tool for Multicore Systems
1 Introduction
2 Related Work
3 Design
3.1 Usage Models
4 Case Studies
4.1 Scheduling on Asymmetric Single-ISA Multicore Systems
4.2 Cache Monitoring
4.3 Measuring Power and Energy Consumption
5 Conclusions and Future Work
References
A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems
1 Introduction
2 Context
3 State of the Art
4 Topology-Aware Performance Monitoring
4.1 Objectives and Features
4.2 Usage and Configuration
4.3 Implementation
5 Analyzing Tasks Concurrency Gives a Room for Thread Placement
5.1 Spreading or Packing Threads?
5.2 A Use Case of Threads' Interference Balancing, Using the Cache Miss Ratio
5.3 Experimental Conditions
5.4 How Lstopo Shows the Situation
6 Conclusion and Future Work
References
Diamond Rings: Acknowledged Event Propagation in Many-Core Processors
1 Introduction
2 Related Work
3 The Diamond Ring Structure
3.1 Extending to Arbitrary Node Counts
3.2 Numbering and Addressing Scheme
3.3 Comparison to Tree-Based Broadcast with Reduction
3.4 Root Node Overhead
4 Implementation Notes and Benchmark Variants
4.1 Communication and Task Scheduling
4.2 Benchmark Variants
5 Evaluation
5.1 Benchmark Results
5.2 Discussion of the Results
6 Conclusions
References
UCHPC - UnConventional High Performance Computing
Energy-Performance Tradeoffs for HPC Applications on Low Power Processors
1 Introduction
2 The Hardware Testbed
3 The Application Benchmark
4 Measurements
5 Results and Discussion
6 Conclusions and Future Works
References
A Cache-Aware Performance Prediction Framework for GPGPU Computations
1 Introduction
1.1 Example
1.2 Prediction of Kernel Execution Times
2 Runtime Model
2.1 Transfer of Data to and from the Device
2.2 Base Cost of Kernel Execution
2.3 Influence of the Work-Group Size
2.4 Basic Operations
2.5 Memory Accesses
3 Empirical Evaluation
4 Quantitative Evaluation
5 Related Work
6 Conclusion
References
Towards Application Variability Handling with Component Models: 3D-FFT Use Case Study
1 Introduction
2 Related Work
3 Component Models
3.1 Overview
3.2 L2C Model
4 Designing 3D-FFT Algorithms with L2C
4.1 3D-FFT Parallel Computation Methods
4.2 Basic Sequential Assembly
4.3 Parallel Assembly for Distributed Architectures
4.4 Assembly Adaptation
5 Performance and Adaptability Evaluation
5.1 Performance and Scalability Evaluation
5.2 Adaptation and Reuse Evaluation
6 Conclusion and Future Work
References
Optimized Force Calculation in Molecular Dynamics Simulations for the Intel Xeon Phi
1 Introduction
2 Short-Range Molecular Dynamics
3 Implementation
3.1 SIMD Vectorization
3.2 Shared-Memory Parallelization
3.3 MPI Parallelization
4 Results and Evaluation
4.1 Test Setup
4.2 Single Core Performance
4.3 Shared-Memory Parallelization
5 Summary
References
VHPC - Virtualization in High-Performance Cloud Computing
A Simplified TDP with Large Tables
1 Introduction
2 Related Work
3 Structures and Operation of the TDP
4 Design and Implementation of the Simplified TDP
5 Conclusion and Further Work
References
GPGPU Virtualisation with Multi-API Support Using Containers
1 Introduction
2 Background and Related Work
3 vGPGPUs as LRMS Resources
3.1 The vGPGPU Factory
3.2 vGPGPU Registry Service
3.3 LRMS Integration
4 Evaluation
5 Conclusions and Future Work
References
Performance Evaluation of Containers for HPC
1 Introduction
2 Context: Virtualization and Containers
3 Related Work
4 Experimental Evaluation of Containers
4.1 Experimental Setup
4.2 Linux Kernel Version and Oversubscription
4.3 Inter-Container Communication
4.4 Multinode Inter-Container Communication
5 Conclusions and Future Work
References
The Virtual Puppet Master: Adaptive Streaming on Top of an SDN-Enabled Virtual Infrastructure
1 Introduction
2 Software Defined Networking
3 The Virtual Puppet Master: Design Considerations
4 VPM Implementation
5 Qualitative Tests
6 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Euro-Par 2015: Parallel Processing Workshops

Description

More details

Other editions

Additional editions

Persons

Content

System requirements