
Euro-Par 2015: Parallel Processing Workshops
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Persons
Content
- Intro
- Preface
- Organization
- Workshop Introduction and Organization
- 4th Workshop on Big Data Management in Clouds (BigDataCloud)
- First European Workshop on Parallel and Distributed Computing Education for Undergraduate Students (Euro-EDUPAR)
- 13th International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar)
- Third Workshop on Large-Scale Distributed Virtual Environments (LSDVE)
- 4th International Workshop on On-Chip Memory Hierarchies and Interconnects (OMHI)
- Third Workshop on Parallel and Distributed Agent-Based Simulations (PADABS)
- First Workshop on Performance Engineering for Large-Scale Graph Analytics (PELGA)
- Second International Workshop on Reproducibility in Parallel Computing (REPPAR)
- 8th Workshop on Resiliency in High-Performance Computing in Clusters, Clouds, and Grids (Resilience)
- Third Workshop on Runtime and Operating Systems for the Many-Core Era (ROME)
- 8th Workshop on UnConventional High-Performance Computing 2015 (UCHPC)
- 10th Workshop on Virtualization in High-Performance Cloud Computing (VHPC)
- Contents
- BigDataCloud - Big Data Management in Clouds
- Distributed Range-Based Meta-Data Management for an In-Memory Storage
- 1 Introduction
- 2 DXRAM Architecture
- 2.1 Chunks
- 2.2 Super-Peer Overlay
- 3 CID-Ranges
- 3.1 CID-Tree
- 3.2 Backup Nodes Integration
- 3.3 Client-Side Caching
- 4 Evaluation
- 4.1 CID-Tree
- 4.2 Client-Side Caching
- 4.3 BG Benchmark
- 5 Related Work
- 6 Conclusions
- References
- Network-Based Data Processing Architecture for Reliable and High-Performance Distributed Storage System
- 1 Introduction
- 1.1 Background
- 1.2 Our Contribution
- 2 Related Work
- 3 System Design
- 3.1 Network-Based Data Processing Architecture
- 3.2 Overview of the System
- 3.3 Data Layout
- 3.4 Switch Architeture
- 3.5 Fallback Mode
- 3.6 Prototype Implementation Overview
- 3.7 Optimized Data Transfer and Processing with RDMA
- 4 Evaluation
- 4.1 Evaluation Target and Conditions
- 4.2 Evaluation Results
- 5 Conclusion and Future Work
- References
- File-Less Approach to Large Scale Data Management
- 1 Introduction
- 2 Related Work
- 3 Filess Vision
- 4 Filess Data Model
- 4.1 Hypergraphs
- 4.2 Overview
- 4.3 Object Composition and Decomposition
- 5 Representing Existing Data Structures and Formats in Filess
- 6 Prototype Design and Implementation
- 7 Conclusions
- References
- Euro-EDUPAR - Parallel and Distributed Computing Education for Undergraduate Students
- Parallel Computing vs. Distributed Computing: A Great Confusion? (Position Paper)
- 1 A (Very) Quick Look at Parallel Computing
- 2 What Is Distributed Computing
- 3 A Fundamental Difference Between Parallel Computing and Distributed Computing
- 4 On the Computational Side: The Hardness of Distributed Computing
- 5 Parallel vs. Distributed Computing: A Schematic View
- 6 An Approach to Teach Distributed Computing
- 7 Distributed Algorithms at the Undergraduate Level
- 8 Distributed Algorithms at the Graduate Level
- 9 When Communication Is Through a Shared Memory
- 10 When Communication Is by Message-Passing
- 11 Conclusion
- A The Non-blocking Atomic Commit Problem
- B Remark on the Notion of a Consensus Number of an Object
- References
- SAUCE: A Web-Based Automated Assessment Tool for Teaching Parallel Programming
- 1 Introduction
- 2 Related Work
- 3 Technical Aspects
- 3.1 Python
- 3.2 SAUCE Web Application
- 3.3 Learning Tools Interoperability
- 3.4 Security Considerations
- 3.5 Distributed Execution
- 4 Use Cases
- 4.1 Solving the Poisson Equation Using MPI
- 4.2 Odd-Even Sort Using OpenMP
- 4.3 Array Reversal Using CUDA
- 4.4 Grading Features
- 5 Conclusion
- References
- Teaching Parallel Programming in Interdisciplinary Studies
- 1 Introduction
- 2 Basic Concepts for Interdisciplinary Students
- 3 Parallel Programming
- 3.1 Shared Memory: OpenMP
- 3.2 Message Passing: MPI
- 3.3 GPUs: CUDA
- 3.4 Performance Analysis: Tools
- 4 Applied Modelling and Simulation
- 5 Conclusions
- References
- On-line Service for Teaching Parallel Programming
- 1 Introduction
- 2 Motivation
- 3 ZawodyWeb System
- 3.1 Overview
- 3.2 Technical Details
- 3.3 Functionality
- 4 UNICORE
- 5 ZawodyWeb Support for Parallel Computing
- 6 Supported Languages
- 6.1 OpenMP
- 6.2 MPI
- 6.3 PCJ
- 7 Results
- 7.1 Practical Evaluation
- 8 Conclusions
- References
- Challenges of a Systematic Approach to Parallel Computing and Supercomputing Education
- 1 Introduction
- 2 Supercomputing Education Infrastructure
- 3 Supercomputing Consortium of Russian Universities
- 4 Supercomputing Education National Project
- 5 Supercomputing Education in Russia's Universities Today
- 5.1 Supercomputing Education at Lomonosov Moscow State University
- 5.2 Supercomputing Education at the Lobachevsky Nizhny Novgorod State University
- 6 Supercomputer Technologies and School Education
- 7 Conclusion
- References
- Teaching Heart Modeling and Simulation on Parallel Computing Systems
- 1 Introduction
- 2 Related Work
- 3 The Course Track ``Heart Modeling and Simulation on Parallel Computing Systems''
- 3.1 General Course Track Description
- 3.2 Prerequisite Courses
- 3.3 Computational Resources
- 4 Parallel and Distributed Computing Module
- 4.1 Parallel and Distributed Computing
- 4.2 GPU Programming
- 4.3 Xeon Phi Programming
- 5 Numerical Methods Module
- 5.1 Parallel Numerical Methods
- 5.2 Science Hackathon
- 6 Heart Modeling Module
- 6.1 Simulation of Living Systems
- 6.2 Modeling Heart Dynamics on Parallel Computing Systems
- 7 Discussion
- 8 Conclusion
- References
- Integration of ICT in Concurrent and Parallel Programming Lectures
- 1 Introduction
- 1.1 Environment
- 1.2 Objectives
- 1.3 Time Schedule
- 2 What Has Been Innovated?
- 2.1 Development Methodology
- 3 Results
- 3.1 Pre-assessment
- 3.2 Post-assessment
- 4 Conclusions and Future Work
- References
- Teamwork Across Disciplines: High-Performance Computing Meets Engineering
- 1 Interdisciplinary Education and Teamwork
- 1.1 Introduction
- 1.2 Challenges
- 1.3 Outline
- 2 Course Curriculum
- 2.1 Teamwork Across Disciplines: Concept
- 2.2 Realization: Turbulent Flow Simulation on HPC-Systems
- 3 Evaluation
- 4 Conclusion
- References
- An Educational Module Illustrating How Sparse Matrix-Vector Multiplication on Parallel Processors Connects to Graph Partitioning
- 1 Introduction
- 2 A Simple Sparse Matrix Data Structure
- 3 Sparse Matrix-Vector Multiplication Goes Parallel
- 4 An Undirected Graph Model for Data Partitioning
- 5 An Educational Module Illustrating the Connection
- 6 Related Work
- 7 Concluding Remarks
- References
- FERBJMON Tools - Visualizing Thread Access on Java Objects using Lightweight Runtime Monitoring
- 1 Introduction
- 2 Related Work
- 3 Java Runtime Monitoring Using FERBJMON Tools
- 3.1 Bytecode Instrumentation
- 3.2 FerbJmon Call Graph
- 3.3 FERBJMON Timeline Diagram of Thread Accesses
- 4 Examples
- 4.1 Producer and Consumer
- 4.2 Cooperative Task Execution
- 5 Performance of FerbJmon Runtime Monitoring
- 6 Conclusion
- References
- Interdisciplinary Practical Course on Parallel Finite Element Method Using HiFlow3
- 1 Introduction
- 2 HiFlow3
- 3 Practical Course on Parallel Numerics
- 4 Summary and Future Work
- References
- HeteroPar - Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms
- A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators
- 1 Introduction
- 2 Hybrid RBT Solver
- 3 RBT for Graphics Processing Units
- 3.1 Implementation
- 3.2 Performance Results
- 4 RBT for Intel Xeon Phi
- 4.1 Implementation
- 4.2 Performance Results
- 5 Conclusion
- References
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- 1 Introduction
- 1.1 Motivation
- 1.2 Contributions
- 2 Background
- 3 Methodology
- 3.1 Static Analysis
- 3.2 Dynamic Analysis
- 3.3 Instruction Operation Metrics
- 4 Analysis
- 4.1 Applications
- 4.2 Methodology
- 4.3 Results
- 5 Related Work
- 6 Conclusion and Future Work
- References
- Modeling Contention and Mapping Effects in Multi-core Clusters
- 1 Introduction
- 2 Related Work
- 3 Modeling Parallel Algorithms
- 4 Case Study 1: Analyzing the Effect of the Contention in Shared Memory
- 5 Case Sudy 2: Modeling the Mapping Effects on Multi-core Clusters
- 6 Test Platforms
- 7 Conclusions
- References
- Towards Community Detection on Heterogeneous Platforms
- 1 Introduction
- 2 Background
- 2.1 The WCC Metric
- 2.2 The Scalable Community Detection Algorithm
- 3 Design and Implementation
- 3.1 The Massively Parallel Version
- 3.2 The Heterogeneous Version
- 3.3 Automatic Partitioning
- 4 Evaluation
- 4.1 The GPU Version
- 4.2 The Heterogenous Version
- 4.3 End-to-End Performance
- 5 Related Work
- 6 Conclusion and Future Work
- References
- A Design Proposal for a Next Generation Scientific Software Framework
- 1 Introduction
- 2 Requirements
- 3 Approach
- 3.1 Embedded Domain-specific-languages
- 3.2 Tiling
- 3.3 Task Based Runtime Support
- 3.4 Proposed Architecture
- 4 Example: Structured AMR
- 4.1 Granularities and Decomposition
- 4.2 Micro-parallelism
- 4.3 Solvers
- 5 Conclusions
- References
- Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures
- 1 Introduction
- 2 Challenges and Opportunities
- 2.1 Improving Performance with Hardware Accelerators
- 2.2 Improving Performance with Direction-Optimized BFS
- 3 The Hybrid Algorithm
- 3.1 Direction-Optimized BFS for Partitioned Graphs
- 3.2 Partition Specialization
- 3.3 Switching Processing Direction for a Partitioned Graph
- 3.4 Optimizations to Improve Access Locality
- 4 Experimental Results
- 4.1 The Impact of Specialized Partitioning
- 4.2 Comparison with Past Work Using Real-World Graphs
- 4.3 The Energy Case
- 5 Summary
- References
- FiNS: A Framework for Accelerating Nested Simulations on Heterogeneous Platforms
- 1 Introduction
- 2 Financial Background
- 3 GPU Background
- 3.1 CUDA Streams and Hyper-Q
- 3.2 Multi Processing Service
- 4 Framework Architecture
- 4.1 The Framework
- 4.2 Streams in Practice
- 5 Nested Simulation for ALM Tooling: A Case Study
- 5.1 Application Description
- 5.2 Using the Framework
- 5.3 Evaluation
- 6 Related Work
- 7 Conclusion and Future Work
- References
- Communication Models Insights Meet Simulations
- 1 Introduction
- 2 Related Work
- 3 Problem Description
- 4 Simulation Framework
- 5 Evaluation
- 5.1 Platform and Jobs Description
- 5.2 Competing Heuristics
- 5.3 Homogeneous Platform Experiment
- 5.4 Heterogeneous Platform Experiments
- 6 Conclusion
- References
- LSDVE - Large Scale Distributed Virtual Environments
- Community Discovery for Interest Management in DVEs: A Case Study
- 1 Introduction
- 2 Related Work
- 3 Reference Architecture
- 3.1 Coverage Peer Sampling
- 4 Distributed Community Discovery Protocols
- 4.1 GROUP
- 4.2 Affinity Propagation
- 5 Experimental Evaluation
- 5.1 Hotspots Approximation
- 5.2 Area Coverage
- 5.3 Message Number and Computational Overhead
- 6 Conclusion
- References
- Continuation Complexity: A Callback Hell for Distributed Systems
- 1 Introduction
- 2 Continuation Complexity Problem
- 2.1 A Simple Example: Chord
- 3 Overcoming the Continuation Complexity Problem in Actor Models
- 3.1 Concurrency Control
- 4 PyActive Abstractions
- 4.1 A Complete Example: Chord
- 5 Evaluating the Expressiveness and Simplicity of Our Approach
- 6 Conclusions
- References
- Offloading Service Provisioning on Mobile Devices in Mobile Cloud Computing Environments
- 1 Introduction
- 2 Related Work
- 3 Hybrid Mobile Cloud Computing Solution for Service Provisioning
- 3.1 Resolution Process
- 3.2 Data Collection
- 3.3 Evaluation of Service Provisioning Alternatives
- 4 System Evaluation
- 4.1 Service Provisioning Time Comparison
- 4.2 Split of Service Executions in the Hybrid Approach
- 5 Conclusions
- References
- A Systematic Quality Analysis of Virtual Desktop Infrastructure Technologies
- 1 Introduction
- 2 Considered Virtual Desktop Infrastructures
- 2.1 Citrix XenDesktop
- 2.2 VMware Horizon View
- 2.3 Microsoft Virtual Desktop Infrastructure
- 3 Technology Implementation and Evaluation Setup
- 4 Benchmarking of the VDI Solutions
- 4.1 Setup: Activities Based on Identified User Types
- 4.2 Setup: Network Variations
- 4.3 Video Streaming Quality: Impact of Individual Network Factors
- 4.4 Streaming Quality of Solutions in Various Network Conditions
- 5 Conclusions
- References
- A Trustworthy Distributed Social Carpool Method
- 1 Introduction
- 2 Related Work
- 3 Carpooling Platform
- 4 The Reputation Algorithm
- 5 Security of the Scheme
- 6 The Android Application
- 7 Conclusions
- References
- OMHI - On-Chip Memory Hierarchies and Interconnects: Organization, Management and Implementation
- Efficient DVFS Operation in NoCs Through a Proper Congestion Management Strategy
- 1 Introduction
- 2 Related Work
- 3 ICARO-DVFS Implementation
- 3.1 Dynamic Voltage and Frequency Scaling
- 3.2 Voltage and Frequency Islands
- 3.3 ICARO
- 3.4 Merging ICARO with DVFS
- 3.5 Different ICARO-DVFS Alternatives
- 3.6 ICARO-DVFS Performance Analysis
- 4 Conclusions and Future Work
- References
- Superoptimizing Memory Subsystems for Multiple Objectives
- 1 Introduction
- 2 Related Work
- 3 Method
- 4 Benchmarks
- 5 Results
- 5.1 Minimizing Writes
- 5.2 Multi-objective Superoptimization
- 6 Conclusions
- References
- PADABS - Parallel and Distributed Agent-Based Simulations
- On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks
- 1 Introduction
- 1.1 Our Results
- 2 The Graph Partitioning Problem
- 3 Experiment Setting
- 4 Results
- 4.1 Analytical Results
- 4.2 Real Setting Results
- 4.3 Correlation Between Analytical and Real Setting Results
- 5 Conclusion
- References
- Distributed Agent-Based Simulation and GIS: An Experiment with the Dynamics of Social Norms
- 1 Introduction
- 2 Experiment
- 3 Results
- 3.1 Experiments Settings
- 3.2 Analytical Analysis of ABM and GIS
- 4 Conclusion
- References
- Behavioral Spherical Harmonics for Long-Range Agents' Interaction
- 1 Introduction
- 2 Background
- 3 Behavioral Spherical Harmonics
- 3.1 Spherical Harmonics
- 3.2 Projection and Reconstruction
- 3.3 Behavioral Spherical Harmonics
- 4 Implementation
- 4.1 Spatial Subdivision with Grid
- 4.2 Projection of Directionality into SH Coefficients
- 4.3 Reconstruction of the Avoidance Direction
- 5 Results
- 6 Conclusion
- References
- Graph-Based Automatic Dynamic Load Balancing for HPC Agent-Based Simulations
- 1 Introduction
- 2 3D Spatial Agent Organisation
- 3 Graph-Based Agent Partitioning
- 4 Automatic Dynamic Load Balancing
- 5 Experimental Results
- 6 Conclusion and Future Work
- References
- Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations
- 1 Introduction
- 2 Background and Related Work
- 2.1 Related Work
- 3 Design and Implementation of TraceR
- 3.1 Running TraceR in Optimistic Mode
- 4 Parameter Choices for TraceR
- 5 Experimental Setup and Configuration Parameters
- 5.1 Conservative Versus Optimistic Simulation
- 5.2 Effect of Batch Size and GVT Interval
- 5.3 Impact of Number of LPs per KP
- 6 Performance Comparison, Scaling and Validation
- 6.1 Comparison with Sequential Executions
- 6.2 Parallel Scaling and Validation of TraceR
- 7 Conclusion
- References
- Road Network Simulation Using FLAME GPU
- 1 Introduction
- 2 Related Work
- 3 Model
- 4 Implementation
- 4.1 Scalable Artificial Road Network
- 4.2 FLAME GPU
- 4.3 Implementing Gipps' Car Following Model Using FLAME GPU
- 4.4 Visualisation and Graphics Techniques
- 5 Experimental Results
- 5.1 Vehicle Scaling for Static Road Network
- 5.2 Scaling of Vehicle and Road Network
- 5.3 Visualisation
- 6 Conclusions
- References
- A Communication Schema for Parallel and Distributed Multi-agent Systems Based on MPI
- 1 Introduction
- 2 Related Work
- 3 Implementing PDMAS on HPC Platforms
- 4 Proposition
- 5 Communication Schema (Receive Message Phase)
- 6 Proxy System (Agents Update Phase)
- 7 Experimentation
- 8 Conclusion and Perspectives
- References
- Large-Scale Agent-Based Modeling with Repast HPC: A Case Study in Parallelizing an Agent-Based Model
- Abstract
- 1 Introduction
- 2 The CA-MRSA Model
- 2.1 Model Structure
- 2.2 Implementation
- 3 From the CA-MRSA Model to the Social Interaction Model
- 3.1 OpenMP
- 3.2 Parallelizing the Model with MPI
- 3.2.1 Minimizing the Data Cost
- 3.2.2 Minimizing the Sending Frequency
- 4 Conclusion
- Acknowledgements
- References
- RAMSES: Reversibility-Based Agent Modeling and Simulation Environment with Speculation-Support
- 1 Introduction
- 2 Related Work
- 3 RAMSES
- 3.1 Reference Programming Model
- 3.2 Exposed API
- 3.3 Tracking Memory Updates for Reversibility
- 3.4 Runtime Execution Support
- 4 Experimental Results
- 5 Conclusions
- References
- PELGA - Performance Engineering for Large-scale Graph Analytics
- Can Embedding Solve Scalability Issues for Mixed-Data Graph Clustering?
- Abstract
- 1 Introduction
- 2 State of the Art and Related Work
- 2.1 Graph Clustering Approaches
- 2.2 Mixed Data Type Clustering Approaches
- 2.3 K-Means Clustering Algorithm
- 2.4 Graph Embedding Systems
- 3 Graph Coordinates Approach: Embedding + Clustering
- 3.1 Embedding
- 3.2 Clustering
- 3.3 Complete Process
- 4 Scalability Testing
- 4.1 Experimental Setup
- 4.2 Results
- 4.3 Comparison with Other Methods
- 4.4 Interactivity
- 4.5 Accuracy
- 4.6 Discussion About the Parallelization of the Clustering and Embedding Processes
- 5 Conclusions and Future Work
- References
- Using the Marshall-Olkin Extended Zipf Distribution in Graph Generation
- 1 Introduction
- 2 The MOEZipf Model
- 3 Real Graphs Analysis
- 4 Generating MOEZipf Degree Samples
- 5 Scalable MOEZipf Generation with Datagen
- 6 Conclusions and Future Work
- References
- Highspeed Graph Processing Exploiting Main-Memory Column Stores
- 1 Introduction
- 2 Related Work
- 3 System Architecture
- 3.1 Columnar Graph Storage
- 3.2 Secondary Graph Index Structure
- 4 Implementation Details
- 4.1 Basic Graph Operations & Building Blocks
- 4.2 Query Implementation
- 4.3 Memory Consumption
- 5 Evaluation
- 5.1 NUMA Effects
- 5.2 Performance
- 6 Conclusion
- References
- A Multi-layer Framework for Graph Processing via Overlay Composition
- 1 Introduction
- 2 The Telos framework
- 2.1 Protocols
- 3 Evaluation
- 3.1 Torus Overlay
- 3.2 Scalability
- 3.3 Graph Partitioning
- 4 Related Work
- 5 Conclusions
- References
- Quantifying the Performance Impact of Graph Structure on Neighbour Iteration Strategies for PageRank
- 1 Introduction
- 2 Background
- 2.1 PageRank
- 2.2 The GPU Architecture
- 3 Design and Implementation
- 3.1 Four PageRank Versions
- 3.2 Estimating Performance
- 3.3 Parallel Performance
- 4 Experimental Evaluation
- 4.1 Experimental Setup
- 4.2 Results
- 4.3 Sorted Graphs
- 5 Related Work
- 6 Conclusion
- References
- Accelerating Minimum Spanning Forest Computations on Multicore Platforms
- 1 Introduction
- 2 Base MSF Algorithm and Target Platform
- 3 Update Edges for Locality
- 4 Stages
- 5 PRAM Simulation
- 6 Combining Stages and PRAM Simulation
- 7 Conclusion and Future Work
- References
- Importance of Runtime Considerations in Performance Engineering of Large-Scale Distributed Graph Algorithms
- 1 Introduction
- 2 Runtime Systems
- 2.1 More Details About AM++
- 3 Algorithms
- 3.1 Distributed Control
- 3.2 -Stepping
- 4 Application Performance Sensitivity to Runtime Features
- 5 Conclusions
- References
- Characterizing Communication Patterns of Parallel Programs Through Graph Visualization and Analysis
- 1 Introduction
- 2 Related Work
- 3 Complex Networks Basics
- 3.1 Graph Visualization and Analysis
- 4 Case Study: Characterizing NAS Parallel Benchmarks
- 4.1 Methodology
- 4.2 NAS Parallel Benchmark
- 4.3 Characterization Results
- 5 Discussion and Future Work
- 6 Conclusion
- References
- REPPAR - Reproducibility in Parallel Computing
- Reproducible and User-Controlled Software Environments in HPC with Guix
- 1 Introduction
- 2 Rationale
- 3 Functional Package Management
- 4 Use Cases
- 4.1 Usage Patterns on an HPC Cluster
- 4.2 Customizing Packages
- 5 Limitations and Challenges
- 6 Related Work
- 7 Conclusion
- References
- Reproducibility in Practice: Lessons Learned from Research and Teaching Experiments
- 1 Introduction
- 2 Reproducibility Scenarios and Tool Support
- 3 Prova!: Performance Reproducibility of Various Applications!
- 4 Case Studies: Research and Teaching Experiments
- 4.1 Stencil Variants in Chapel
- 4.2 Non-blocking Collectives
- 4.3 Students' Work Replication
- 5 Conclusions and Future Work
- References
- Towards Complete Tracking of Provenance in Experimental Distributed Systems Research
- 1 Introduction
- 2 Provenance in Computer Science
- 2.1 General Provenance
- 2.2 Provenance in General Computing
- 2.3 Provenance in Scientific Workflows
- 2.4 Provenance in Control-Flows
- 2.5 Provenance in Experimental Distributed Systems Research
- 3 New Classification of Provenance
- 4 Design of a Provenance System
- 4.1 Provenance of Experiment Data
- 4.2 Provenance of Experiment Description
- 4.3 Provenance of Experiment Process
- 5 Conclusions and Future Work
- References
- Resilience - Resiliency in High Performance Computing with Clouds, Grids, and Clusters
- A Case Study of Application Structure Aware Resilience Through Differentiated State Saving and Recovery
- 1 Introduction
- 2 The Libraries
- 3 Resilience Strategy
- 3.1 Saving Chombo State with GVR
- 3.2 Failure Scenarios and Recovery Modes
- 4 Tuning for a Specific Platform
- 5 Future Work
- References
- A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q
- 1 Introduction
- 2 Dataset Description
- 3 Data Analysis
- 3.1 Individual Datasets
- 3.2 The Big Picture
- 4 Related Work
- 5 Discussion and Conclusions
- References
- Addressing the Last Roadblock for Message Logging in HPC: Alleviating the Memory Requirement Using Dedicated Resources
- 1 Introduction
- 2 Background and Related Work
- 2.1 Message-Logging Protocols
- 2.2 Reducing the Log Size
- 3 Dedicated Logger Nodes
- 4 Evaluation
- 4.1 Implementation
- 4.2 Experimental Setup
- 4.3 Dumping Overhead with Different Number of Logger Nodes
- 4.4 Combining Hierarchical Protocols with Logger Nodes
- 4.5 Dumping Overhead with Different Memory Limits
- 4.6 Use Case
- 5 Conclusion
- References
- Towards Understanding Post-recovery Efficiency for Shrinking and Non-shrinking Recovery
- 1 Introduction
- 2 A Synthetic Bulk-Synchronous Application
- 3 Experiments
- 3.1 Experiment Setup
- 3.2 Performance vs. Communication Pattern and Replacement Node Distance
- 3.3 Performance vs. Communication Fraction
- 4 Discussion
- 5 Related Work
- 6 Summary and Future Work
- References
- Canaries in a Coal Mine: Using Application-Level Checkpoints to Detect Memory Failures
- 1 Introduction
- 2 Checkpointing on Current Systems
- 3 Approach: Using Checkpoints as Failure Detectors
- 4 Using Learning Approaches to Classify Checkpoints
- 4.1 Application Checkpoint Description
- 4.2 Modeling Checkpoint Data
- 4.3 Choosing an ML Technique
- 4.4 Unsupervised Learning: Clustering with K-Means
- 4.5 Supervised Learning with Decision Tree Methods
- 5 Related Work
- 6 Conclusion
- References
- ROME - Runtime and Operating Systems for the Many-Core Era
- Energy Characterization and Optimization of Parallel Prefix-Sums Kernels
- 1 Introduction
- 2 Implementation
- 2.1 CPPS Algorithm
- 2.2 Sequential Prefix-Sums Kernels
- 2.3 Thread Placement Policies
- 3 Discussion
- 4 Experiments
- 5 Conclusion
- References
- An OS-Oriented Performance Monitoring Tool for Multicore Systems
- 1 Introduction
- 2 Related Work
- 3 Design
- 3.1 Usage Models
- 4 Case Studies
- 4.1 Scheduling on Asymmetric Single-ISA Multicore Systems
- 4.2 Cache Monitoring
- 4.3 Measuring Power and Energy Consumption
- 5 Conclusions and Future Work
- References
- A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems
- 1 Introduction
- 2 Context
- 3 State of the Art
- 4 Topology-Aware Performance Monitoring
- 4.1 Objectives and Features
- 4.2 Usage and Configuration
- 4.3 Implementation
- 5 Analyzing Tasks Concurrency Gives a Room for Thread Placement
- 5.1 Spreading or Packing Threads?
- 5.2 A Use Case of Threads' Interference Balancing, Using the Cache Miss Ratio
- 5.3 Experimental Conditions
- 5.4 How Lstopo Shows the Situation
- 6 Conclusion and Future Work
- References
- Diamond Rings: Acknowledged Event Propagation in Many-Core Processors
- 1 Introduction
- 2 Related Work
- 3 The Diamond Ring Structure
- 3.1 Extending to Arbitrary Node Counts
- 3.2 Numbering and Addressing Scheme
- 3.3 Comparison to Tree-Based Broadcast with Reduction
- 3.4 Root Node Overhead
- 4 Implementation Notes and Benchmark Variants
- 4.1 Communication and Task Scheduling
- 4.2 Benchmark Variants
- 5 Evaluation
- 5.1 Benchmark Results
- 5.2 Discussion of the Results
- 6 Conclusions
- References
- UCHPC - UnConventional High Performance Computing
- Energy-Performance Tradeoffs for HPC Applications on Low Power Processors
- 1 Introduction
- 2 The Hardware Testbed
- 3 The Application Benchmark
- 4 Measurements
- 5 Results and Discussion
- 6 Conclusions and Future Works
- References
- A Cache-Aware Performance Prediction Framework for GPGPU Computations
- 1 Introduction
- 1.1 Example
- 1.2 Prediction of Kernel Execution Times
- 2 Runtime Model
- 2.1 Transfer of Data to and from the Device
- 2.2 Base Cost of Kernel Execution
- 2.3 Influence of the Work-Group Size
- 2.4 Basic Operations
- 2.5 Memory Accesses
- 3 Empirical Evaluation
- 4 Quantitative Evaluation
- 5 Related Work
- 6 Conclusion
- References
- Towards Application Variability Handling with Component Models: 3D-FFT Use Case Study
- 1 Introduction
- 2 Related Work
- 3 Component Models
- 3.1 Overview
- 3.2 L2C Model
- 4 Designing 3D-FFT Algorithms with L2C
- 4.1 3D-FFT Parallel Computation Methods
- 4.2 Basic Sequential Assembly
- 4.3 Parallel Assembly for Distributed Architectures
- 4.4 Assembly Adaptation
- 5 Performance and Adaptability Evaluation
- 5.1 Performance and Scalability Evaluation
- 5.2 Adaptation and Reuse Evaluation
- 6 Conclusion and Future Work
- References
- Optimized Force Calculation in Molecular Dynamics Simulations for the Intel Xeon Phi
- 1 Introduction
- 2 Short-Range Molecular Dynamics
- 3 Implementation
- 3.1 SIMD Vectorization
- 3.2 Shared-Memory Parallelization
- 3.3 MPI Parallelization
- 4 Results and Evaluation
- 4.1 Test Setup
- 4.2 Single Core Performance
- 4.3 Shared-Memory Parallelization
- 5 Summary
- References
- VHPC - Virtualization in High-Performance Cloud Computing
- A Simplified TDP with Large Tables
- 1 Introduction
- 2 Related Work
- 3 Structures and Operation of the TDP
- 4 Design and Implementation of the Simplified TDP
- 5 Conclusion and Further Work
- References
- GPGPU Virtualisation with Multi-API Support Using Containers
- 1 Introduction
- 2 Background and Related Work
- 3 vGPGPUs as LRMS Resources
- 3.1 The vGPGPU Factory
- 3.2 vGPGPU Registry Service
- 3.3 LRMS Integration
- 4 Evaluation
- 5 Conclusions and Future Work
- References
- Performance Evaluation of Containers for HPC
- 1 Introduction
- 2 Context: Virtualization and Containers
- 3 Related Work
- 4 Experimental Evaluation of Containers
- 4.1 Experimental Setup
- 4.2 Linux Kernel Version and Oversubscription
- 4.3 Inter-Container Communication
- 4.4 Multinode Inter-Container Communication
- 5 Conclusions and Future Work
- References
- The Virtual Puppet Master: Adaptive Streaming on Top of an SDN-Enabled Virtual Infrastructure
- 1 Introduction
- 2 Software Defined Networking
- 3 The Virtual Puppet Master: Design Considerations
- 4 VPM Implementation
- 5 Qualitative Tests
- 6 Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.