
Euro-Par 2011 Parallel Processing
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title
- Preface
- Organization
- Table of Contents
- Topic 1: Support Tools and Environments
- Introduction
- Run-Time Automatic Performance Tuning for Multicore Applications
- Introduction
- The Perpetuum Run-Time Application Tuner
- Preparing Applications for Online Tuning
- Perpetuum in Action: Automated Online-Tuning in Parallel Compression
- Environment
- Scenario 1: Tuning a Single Process
- Scenario 2: Simultaneously Auto-tuning Two Processes
- Scenario 3: Simultaneously Auto-tuning Two Processes Starting with a Time Lag
- Summary
- Automated Online-Tuning in Parallel Video Processing
- Scenario 1: Tuning a Single Process
- Scenario 2: Simultaneously Tuning Two Processes Starting with a Time Lag
- Related Work
- Conclusion
- References
- Exploiting Cache Traffic Monitoring for Run-Time Race Detection
- Introduction
- Assumptions and Requirements
- Software
- Hardware
- Monitoring Cache Traffic to Detect and Heal Races
- Race Detection
- Race Healing
- TheDetector
- Evaluation
- Setup
- Results
- Related Work
- Conclusion
- References
- Accelerating Data Race Detection with Minimal Hardware Support
- Introduction
- Background
- Minimal Hardware Support for Data Race Detection
- AccessedBefore (AccB) Algorithm
- Sources of Inaccuracy
- Implementation
- Hardware Support
- Software Layer
- Optimizations
- System Issues
- Experimental Setup
- Evaluation
- AccB versus HapB
- Overheads Characterization
- Accuracy Characterization
- Related Work
- Conclusions
- References
- Quantifying the Potential Task-Based Dataflow Parallelism in MPI Applications
- Introduction
- SMPSs Programming Model
- Motivation
- Framework
- Input Code
- Code Translator
- Tracer
- Replay Simulator
- Experiments
- Results
- Related Work
- Conclusion
- References
- Event Log Mining Tool for Large Scale HPC Systems
- Introduction
- Related Work
- Methodology
- Offline Clustering
- Splitting Process
- Output
- Online Clustering
- Log Files
- Results
- Offline
- Online
- Conclusion and Future Work
- References
- Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis
- Introduction
- Related Work
- A Configurable Instrumenter
- Adapter Specification
- Filter Specification
- Filter Criteria
- Evaluation
- Conclusion and Future Work
- References
- Topic 2: Performance Prediction and Evaluation
- Introduction
- Reducing Energy Usage with Memory and Computation-Aware Dynamic Frequency Scaling
- Introduction
- Methodology
- Benchmarking for Power and Performance
- Application Characterization
- Experimental Results
- Drawing Conclusions about System Behavior
- Energy-Optimal Clock Frequency Selection
- Technique Validation
- Related Work
- Future Work
- Conclusions
- References
- A Contention-Aware Performance Model for HPC-Based Networks: A Case Study of the InfiniBand Network
- Introduction
- Background
- Elements Influencing Network Contention
- Methodology
- Dynamic Contention Graph
- Sequence of Linear Models
- Approximation of Penalty Coefficients
- Modelling Penalty Coefficients over InfiniBand
- InfiniBand Network Testbed
- Penalty Coefficients and Model for InfiniBand
- Examples and Validation
- Conclusion and Future Work
- References
- Using the Last-Mile Model as a Distributed Scheme for Available Bandwidth Prediction
- Introduction
- Related Works
- Latency Estimation
- Bandwidth Estimation
- Last-Mile Bandwidth Prediction Model
- Last-Mile Model
- Initial Values
- Iterative Procedure
- Evaluation
- Methodology
- Parameter Tuning
- Comparison Methods
- Evaluation Results
- Concluding Remarks
- References
- Self-stabilization versus Robust Self-stabilization for Clustering in Ad-Hoc Network
- Introduction
- Overview of the Studied Clustering Protocols
- Model and Simulation Remarks
- Observed Metrics
- Simulation Results and Performances Analysis
- Average Number of Cluster-Heads
- Availability of Minimum Service
- Availability of Optimum Service
- Concluding Remarks
- References
- Multilayer Cache Partitioning for Multiprogram Workloads
- Introduction
- Motivational Example for Multilayer Partitioning
- Dynamic Performance Model
- Proposed Partitioning Algorithm
- Experimental Evaluation
- Implementation and Setup
- Results
- Concluding Remarks
- References
- Backfilling with Guarantees Granted upon Job Submission
- Introduction
- Algorithms
- Prioritized Compression
- Delayed Compression
- Related Work
- Experimental Results
- Increasing Responsiveness
- Favoring Wide Jobs
- Scheduler Running Time
- Discussion
- References
- Topic 3: Scheduling and Load Balancing
- Introduction
- Greedy "Exploitation" Is Close to Optimal on Node-Heterogeneous Clusters
- Introduction
- Formal Details
- Work Production under the LIFO and FIFO Protocols
- The LIFO Protocol Is Approximately Optimal
- A Lower Bound on the LIFO Work Production W(L)(C
- L)
- An Upper Bound on the FIFO Work Production W(F)(C
- L)
- The LIFO-FIFO Bounding Ratio
- Conclusions
- References
- Scheduling JavaSymphony Applications on Many-Core Parallel Computers
- Introduction
- Related Work
- JavaSymphony
- JavaSymphony Scheduler
- System Architecture
- Scheduling Methodology
- Algorithm
- Experiments
- Experimental Methodology
- Communication-Intensive Applications
- Training Experiments.
- Validation Experiments.
- Computation-Intensive Applications
- Training Experiments.
- Validation Experiments.
- Conclusions
- Assessing the Computational Benefits of AREA-Oriented DAG-Scheduling
- Introduction
- Background
- Finding Good AO-Schedules Efficiently
- Experiments to Assess the Quality of aoh
- Experimental Design
- Experimental Methodology
- Experimental Results and Discussion
- Conclusion
- Analysis and Modeling of Social Influence in High Performance Computing Workloads
- Introduction
- Data Sources
- Social Influence Model
- Analysis of Social Influence
- Community Extraction from HPC Workloads
- Power-Law Distribution of Discovered Communities
- Design of Online Learning Mechanism
- Related Work
- Conclusions and Future Work
- Work Stealing for Multi-core HPC Clusters
- Introduction
- Work Stealing
- Design for Our Approach
- Shared Memory Design
- Distributed Memory Design
- Combined Approach
- Evaluation
- UTS
- Results
- Related Work
- Conclusion and Future Work
- A Dynamic Power-Aware Partitioner with Task Migration for Multicore Embedded Systems
- Introduction
- Related Work
- System Model
- Task Real-Time Behavior
- Power-Aware Scheduler
- Partitioning Heuristics with Task Migration
- Extending Worst Fit to Support Task Migration
- Dynamic Partitioner
- Experimental Results
- Impact of Applying Migrations at Different Points of Time
- Comparing DP versus WF Variants
- Conclusions
- Exploiting Thread-Data Affinity in OpenMP with Data Access Patterns
- The Data Access Pattern Approach
- Data Access Pattern Definition
- Runtime Extensions to Exploit Patterns
- Iteration Space Partitioning
- A Dynamic Scheduling Policy for Pattern Enabled OpenMP Runtimes
- Work Stealing Strategy
- Experimental Results
- Benchmark Suite
- Performance Analysis
- Remote Memory Access Analysis
- Related Work
- Conclusions
- Workload Balancing and Throughput Optimization for Heterogeneous Systems Subject to Failures
- Introduction
- Framework and Optimization Problems
- Complexity Results
- Complexity of the MinPer (*,fi,*) Problems
- Complexity of the MinPer (*,fi,u,*) Problems
- Heuristics and Simulations
- Polynomial Time Heuristics
- Simulations
- Conclusion
- On the Utility of DVFS for Power-Aware Job Placement in Clusters
- Introduction
- Related Work
- Power-Aware Job Placement with DVFS
- Problem Statement
- Problem Formulation
- DVFS/DFS Model
- Numerical Results
- Experimental Methodology
- Results for Small Instances
- Results for Large Instances
- Conclusion
- Topic 4: High-Performance Architecture and Compilers
- Introduction
- Filtering Directory Lookups in CMPs with Write-Through Caches
- Introduction
- Motivation
- Filtering Mechanism
- Overview
- Filter Operation
- Filter States
- Filter Overhead
- Evaluation
- Chip Multiprocessor Model
- Methodology
- Filter Coverage
- Performance
- Power Consumption
- Related Work
- Conclusions
- References
- FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores
- Introduction
- On-Chip Distributed Shared Memory
- Chip MultiProcessor Architectures
- Single Global Address Space On-Chip
- FELI - Operating System Support for Locality Management
- L0 Cache
- Discussion on DSM Architecture Parameters
- Methodology
- Experimental Evaluation
- Related Work
- Conclusions
- References
- Token3D: Reducing Temperature in 3D Die-Stacked CMPs through Cycle-Level Power Control Mechanisms
- Introduction
- Background and Related Work
- Power and Thermal Control in Microprocessors
- Building a 3D Die-Stacked Processor
- 3D Integration Technology
- Thermal Control in 3D Die-Stacked Processors
- Token3D: Balancing Temperature on 3D-Staked Designs
- Token3D Implementation Details
- Experimental Results
- Simulation Environment
- Effects of Token3D on Peak Temperature
- Further Temperature Optimizations
- Conclusions
- References
- Bandwidth Constrained Coordinated HW/SW Prefetching for Multicores
- Introduction
- Background and Methodology
- Prefetching
- Experimental Setup
- Empirical Motivation
- Prefetching Benefits
- Off-Chip Bandwidth Effects
- Prefetch Request Priority
- Bandwidth Aware Prefetching
- Core-Level Prefetch Manager
- Prefetch Levels
- Global Prefetch Manager
- Experimental Evaluation
- Related Work
- Concluding Remarks
- References
- Unified Locality-Sensitive Signatures for Transactional Memory
- Introduction
- Background and Related Work
- Unified Signature Design
- Hardware Evaluation
- False Positive Analysis
- Evaluation
- Methodology
- Unified Signature Results
- Unified Locality-Sensitive Signature Results
- Conclusions
- References
- Using Runtime Activity to Dynamically Filter Out Inefficient Data Prefetches
- Introduction
- Related Works
- Correlation between Runtime Activity and Prefetch Efficiency
- Adaptive Prefetching Method Based on Runtime Activity
- Experimental Evaluation
- Experimental Environment
- Experimental Results
- Conclusion
- References
- Topic 5: Parallel and Distributed Data Management
- Introduction
- Distributed Scalable Collaborative Filtering Algorithm
- Introduction
- Related Work
- Background and Notation
- Optimized Distributed Co-clustering Algorithm
- Parallel Time Complexity Analysis
- Optimum Thread Distribution
- Results and Analysis
- Strong Scalability
- Weak Scalability
- Data Scalability
- Conclusions and Future Work
- References
- Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data
- Introduction
- A Motivating Example
- Problem Statement
- Theory and Methodology
- Sorting-Based Data Transformation
- Cubic B-Splines Fitting
- Maximizing Compression Ratio via Window Splitting
- Error Quantization for Guaranteed Point-by-Point Accuracy
- Exploiting -Encoding for Temporal Index Compression
- Results
- Per Window Accuracy
- Trade-Off between Compression and Per Point Accuracy
- Effect of -encoding on Index Compression
- Compression Time
- Performance for Fixed Compression
- Related Work
- Summary
- References
- kNN Query Processing in Metric Spaces Using GPUs
- Introduction
- Similarity Search Background and Related Work
- List of Clusters (LC)
- Sparse Spatial Selection (SSS-Index)
- Graphic Processing Units (GPU)
- GPU Mapping of k-Nearest Neighbor Algorithms
- Exhaustive Search Algorithm
- LC
- SSS-Index
- Experimental Results
- Conclusions
- References
- An Evaluation of Fault-Tolerant Query Processing for Web Search Engines
- Introduction
- Indexing and Ranking
- Experimental Framework
- Process-Oriented Discrete-Event Simulator
- Simulating Failures
- Simulator Validation
- Comparative Evaluation
- Concluding Remarks
- References
- Topic 6: Grid Cluster and Cloud Computing
- Introduction
- Self-economy in Cloud Data Centers: Statistical Assignment and Migration of Virtual Machines
- Introduction
- Assignment and Migration of Virtual Machines
- Assignment Procedure
- Migration Procedure
- Performance Evaluation
- Related Work
- Conclusion and Future Work
- References
- An Adaptive Load Balancing Algorithm with Use of Cellular Automata for Computational Grid Systems
- Introduction
- Cellular Automata
- The Proposed Load Balancing Algorithm
- Simulation and Results
- Simulation Model
- Conclusion
- References
- Shrinker: Improving Live Migration of Virtual Clusters over WANs with Distributed Data Deduplication and Content-Based Addressing
- Introduction
- Background and Related Work
- Architecture of Shrinker
- Architecture Overview
- Security Considerations
- Implementation and Performance Evaluation
- Implementation
- Evaluation Methodology
- Performance Results
- Conclusion
- References
- Maximum Migration Time Guarantees in Dynamic Server Consolidation for Virtualized Data Centers
- Introduction
- Related Work
- Server Consolidation Algorithm
- First Phase: Minimizing Migration Time
- Second Phase: Minimizing the Number of Physical Servers
- Evaluation
- Conclusion and Future Work
- References
- Enacting SLAs in Clouds Using Rules
- Introduction
- Related Work
- Escalation Levels
- Rule-Based Approach for VM Level
- Prerequisites
- Design and Implementation
- Evaluation
- Utility-Driven Evaluation
- Performance-Driven Evaluation
- Conclusion and Outlook
- References
- DEVA: Distributed Ensembles of Virtual Appliances in the Cloud
- Introduction
- System Overview
- DEVA Manager
- Mapping of DEVAs
- DEVAs across Heterogeneous Resources
- DEVA Agents
- Experimental Results
- Overhead Measurement
- Isolation and QoS Conservation
- Use of Heterogeneous DEVAs and Resources
- Related Work
- Conclusions and Future Work
- References
- Benchmarking Grid Information Systems
- Introduction
- Related Work
- Methodology
- Benchmarking MDS and BDII
- Background
- Experiment Setup
- Query Response Time
- Quality of Information
- Discussion
- Conclusion
- References
- Green Cloud Framework for Improving Carbon Efficiency of Clouds
- Introduction
- Related Work
- Carbon Aware Green Cloud Architecture
- Third Party: Green Offer Directory and Carbon Emission Directory
- User: Green Broker
- Provider: Green Middleware
- Case Study: IaaS Cloud
- Carbon Efficient Green Policy (CEGP)
- Performance Evaluation and Results
- Comparison of CEGP with Performance-Based Algorithm (EST)
- Effect of Relationship between CO2 Emission Rate and Datacenter Power Efficiency DCiE
- Conclusion
- References
- Optimizing Multi-deployment on Clouds by Means of Self-adaptive Prefetching
- Introduction
- Our Approach
- Design Principles
- Architecture
- Implementation
- Experimental Evaluation
- Experimental Setup
- Performance of Multi-deployment
- Related Work
- Conclusions
- References
- Topic 7: Peer to Peer Computing
- Introduction
- Combining Mobile and Cloud Storage forProviding Ubiquitous Data Access
- Introduction
- System Design
- Architecture
- Synchronization Process
- Advanced Mechanism for Using Mobile Phones
- Minimizing Communications
- Improving Storage Usage
- Evaluation
- Importance of DSV and DTC
- Performance Impact of the DSV and DTC
- Related Work
- Final Remarks
- References
- Asynchronous Peer-to-Peer Data Mining with Stochastic Gradient Descent
- Introduction
- System and Data Model
- Background
- Related Work
- The Algorithm
- Experimental Results
- Scenarios
- Metrics
- Results
- Conclusions
- References
- Evaluation of P2P Systems under Different Churn Models: Why We Should Bother
- Introduction
- Related Work
- Models and Distributions
- Churn Models
- ON/OFF Distributions
- Comparative Analysis
- Poissonity in Arrivals
- Availability Inter-dependence: A First Difference
- Reliability: A Second Difference
- Conclusions
- References
- Topic 8: Distributed Systems and Algorithms
- Introduction
- Productive Cluster Programming with OmpSs
- Introduction
- OmpSs: From Multicores to Clusters
- Overview
- Example
- Implementation
- Evaluation
- Methodology
- Results
- Related Work
- Conclusions and Future Work
- References
- On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications
- Introduction
- Context
- Communication Patterns
- Partitioning for Partial Message Logging Protocols
- Two Possible Approaches
- Bisection-Based Partitioning
- Evaluation
- Defining a Cost Function
- Results
- Conclusion
- References
- Object Placement for Cooperative Caches with Bandwidth Constraints
- Introduction
- System Model and Problem Formulation
- Hardness Result and Approximation Algorithm
- Hardness Proof
- Efficient Placement Algorithm
- Approximation Ratio
- Extensions to Algorithm PA
- Empirical Evaluation
- Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.