High Performance Computing

Name: High Performance Computing | 34th International Conference, ISC High Performance 2019, Frankfurt/Main, Germany, June 16-20, 2019, Proceedings
Brand: Springer
Price: 69.54 EUR
Availability: OnlineOnly

34th International Conference, ISC High Performance 2019, Frankfurt/Main, Germany, June 16-20, 2019, Proceedings

Michèle Weiland Guido Juckeland Carsten Trinitis Ponnuswamy Sadayappan(Editor)

Springer (Publisher)

Published on 5. June 2019

XVI, 352 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-20656-7 (ISBN)

€69.54incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Contents
Architectures, Networks and Infrastructure
Evaluating Quality of Service Traffic Classes on the Megafly Network
1 Introduction
2 Exploring Quality of Service on HPC Networks
3 Evaluation Methodology
3.1 HPC Simulation Environment
3.2 Topology and Routing Description
3.3 Network Configurations
3.4 Workloads
3.5 Rank-to-Node Mappings
4 Quantifying Interference on 1-D Dragonfly and Megafly Networks
5 Evaluating Quality of Service on Megafly Networks
5.1 QoS Mechanism I: Prioritizing Entire Applications
5.2 QoS Mechanism II: Prioritizing and Guaranteeing Bandwidth to Latency-Sensitive Operations
5.3 Applying QoS Mechanisms to Multiple Application Workloads in Parallel
6 Related Work
7 Discussion and Conclusion
References
Artificial Intelligence and Machine Learning
Densifying Assumed-Sparse Tensors
1 Introduction
2 Background
3 Issues with Scaling the Transformer Model
4 Densifying Assumed-Sparse Tensors
5 Experimental Results
5.1 Weak Scaling Performance
5.2 Strong Scaling
5.3 Model Accuracy
6 Discussion
7 Future Work and Conclusion
References
Learning Neural Representations for Predicting GPU Performance
1 Introduction
2 Background and Motivation
2.1 Related Work
2.2 Explicit Features
2.3 Representation Learning
2.4 Collaborative Filtering
3 Prediction Model
3.1 Multi-layer Perceptron Model
3.2 Multiple Training Objectives
3.3 Automated Architecture Search
4 Experiment Setup
4.1 Machine Specification
4.2 Benchmarks
4.3 Methodology
5 Results and Discussions
5.1 Performance of Matrix Factorization (R1)
5.2 Performance of Multi-layer Perceptron (R2)
5.3 Training with Additional Metrics (R3)
6 Conclusions
References
Data, Storage and Visualization
SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis
1 Introduction
2 Preliminaries
2.1 Multidimensional Array
2.2 User-Defined Function and Programming Model
3 SLOPE Programming Model
3.1 Abstract Data Type-Stencil
3.2 SLOPE Programming Model
3.3 Example Data Analysis Using SLOPE
4 Parallel Execution Engine
4.1 Overview of Parallel Execution Engine
4.2 Data Partitioning and Halo Layer
4.3 Data and Computing Scheduling
4.4 Output Array Dimension
4.5 Advanced Features
4.6 Implementation of SLOPE
5 Evaluation
5.1 Evaluation Using Synthetic Data Analysis
5.2 Evaluation for SLOPE and Spark Using Real Applications
6 Related Work
7 Conclusions and Future Work
References
A Near-Data Processing Server Architecture and Its Impact on Data Center Applications
1 Introduction
2 Related Work
3 NDP Server Architecture
3.1 The Architecture of a Conventional Server
3.2 The New NDP Server Architecture
4 Implementations
4.1 Implementation Methodology
4.2 Implementation of SANS and SFNS
4.3 Implementation of the Applications
5 Evaluation
5.1 Evaluation of FFT & LC & HE
5.2 Evaluation of the Three k-NN Applications
5.3 Impact of NDP on Data Center Applications
6 Conclusions
References
Comparing the Efficiency of In Situ Visualization Paradigms at Scale
1 Introduction
2 Related Work
3 Experimental Overview
4 Results
4.1 Time to Solution
4.2 Total Cost
4.3 Scalability of Visualization Algorithms
5 Discussion
5.1 In-line and In-transit Cost Models
5.2 In-Line and In-Transit Time to Solution
6 Conclusion and Future Directions
References
Emerging Technologies
Layout-Aware Embedding for Quantum Annealing Processors
1 Introduction
2 Background
2.1 Quantum Annealing
2.2 Minor-Embedding
3 Layout-Awareness
4 Layout-Aware Embedding Methods
4.1 Global Placement
4.2 Diffusion-Based Migration
4.3 Disperse Router
4.4 Combined Approach
4.5 Related Work
5 Evaluation
5.1 Benchmark Problems: Quantum-Dot Cellular Automata
5.2 Embedding Results
5.3 Sampling Results
6 Conclusions
References
HPC Algorithms
Toward Efficient Architecture-Independent Algorithms for Dynamic Programs
1 Introduction
2 Multi-way Recursive Divide and Conquer
2.1 r-way R-DP Design
2.2 Additional r-way R-DP Algorithms
3 External-Memory GPU Algorithms
3.1 GPU Computing Model
3.2 Related Work (GPU)
3.3 GPU Algorithm Design
3.4 I/O Complexities
3.5 GPU Experimental Results
4 Distributed-Memory Algorithms
4.1 Distributed-Memory r-way R-DP
4.2 Bandwidth and Latency Lower Bounds
4.3 Related Work (Distributed Memory)
4.4 Distributed Memory Experimental Results
5 Conclusion
References
HPC Applications
Petaflop Seismic Simulations in the Public Cloud
1 Introduction and Related Work
2 Earthquake Simulations
2.1 Fused Forward Simulations
2.2 Model Setup
2.3 Shared Memory Dynamic Load Balancing
3 Cloud Setup
4 Benchmarking the Cloud
4.1 Floating Point Throughput
4.2 Memory
4.3 Interconnect
4.4 Single-Node Application Performance
5 Elastic Scalability
6 Discussion and Conclusion
References
MaLTESE: Large-Scale Simulation-Driven Machine Learning for Transient Driving Cycles
1 Introduction
2 Surrogate Modeling for Transient Drive Cycle Simulation
2.1 Engine Simulator
2.2 ML-Based Surrogate Modeling
3 Experimental Results
3.1 Setup
3.2 Training Data Generation at Scale
3.3 Comparison of ML Methods
3.4 Impact of Training Set Size
3.5 Model Adaptation Using Transfer Learning and Retraining
4 Related Work
5 Conclusion
A Appendix
References
Performance Modeling and Measurement
PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems
1 Introduction
2 Related Work
3 Automatic Discovery of Performance Problems
3.1 False Sharing
3.2 Main Memory Bandwidth
4 PerfMemPlus Implementation
4.1 Profiling Tool
4.2 Viewer
5 Evaluation
5.1 The PARSEC Benchmarks
5.2 Canneal
5.3 Streamcluster
5.4 Freqmine
5.5 Mnist
5.6 N3LP
5.7 Overhead
6 Conclusion and Future Work
References
GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications
1 Introduction
2 Related Work
3 Background and Overview
3.1 Example of Mixed-Precision Tuning
3.2 Configurations
3.3 Overview of Our Approach
4 Approach
4.1 Kernel Intermediate Representation
4.2 FISet Design
4.3 FISet Illustration
4.4 FISet Properties and Algorithm
4.5 Shadow Computations
4.6 Limitations
5 Evaluation
5.1 Comparison Approach: Precimonious
5.2 CUDA Programs
5.3 Overhead of Shadow Computations
5.4 Threshold Settings
5.5 Case 1: LULESH
5.6 Case 2: CoMD
5.7 Case 3: CFD
6 Conclusions
References
Performance Exploration Through Optimistic Static Program Annotations
1 Introduction
2 Static Program Annotation
3 Optimistic Optimization Opportunities
3.1 Potentially Overflowing Computations
3.2 Potentially Parallel Loops
3.3 Control Flow Speculation
3.4 Function Behavior
3.5 Pointer Attributes
3.6 Overlapping and Inconsistent Annotations
4 Implementation Details
4.1 Granularity of Optimistic Opportunities
4.2 Search Space Exploration
5 Evaluation
5.1 RSBench (A)
5.2 XSBench (B)
5.3 PathFinder (C)
5.4 CoMD (D)
5.5 Pennant (E)
5.6 MiniGMG (F)
5.7 Successfully Verified Annotations
5.8 Optimistic Choices
5.9 Comparison with Link Time Optimization (LTO)
6 Related Work
7 Conclusion and Future Work
References
Programming Models and Systems Software
End-to-End Resilience for HPC Applications
1 Introduction
2 Background
3 Assumptions
4 End-to-End Resilience
5 Implementation Details
6 Experimental Results
6.1 Matrix Multiplication
6.2 TF-IDF
6.3 NAS Parallel Benchmarks
7 Related Work
8 Conclusion
References
Resilient Optimistic Termination Detection for the Async-Finish Model
1 Introduction
2 Background
2.1 Nested Task Parallelism Models
2.2 The X10 Programming Model
3 Related Work
4 Message-Optimal Async-Finish Termination Detection
5 Async-Finish Termination Detection Under Failure
6 Distributed Task Tracking
6.1 Finish and LocalFinish Objects
6.2 Task Events
6.3 Non-resilient Finish Protocol
7 Resilient Pessimistic Finish
7.1 Adopting Orphan Tasks
7.2 Excluding Lost Tasks
8 Our Proposed Protocol: Resilient Optimistic Finish
8.1 Adopting Orphan Tasks
8.2 Excluding Lost Tasks
8.3 Optimistic Finish TLA Specification
9 Finish Resilient Store Implementations
9.1 Reviving the Distributed Finish Store
10 Performance Evaluation
10.1 Microbenchmarks
10.2 LULESH
11 Conclusion
References
Global Task Data-Dependencies in PGAS Applications
1 Introduction
2 Background and Motivation
3 Related Work
4 Global Task Data Dependencies
4.1 Creating the Global Task Graph
4.2 Executing the Global Task Graph
5 Implementation
5.1 Example Code
5.2 Range-Based Task and Dependency Creation
6 Experimental Evaluation
6.1 Micro-benchmarks
6.2 Blocked Cholesky Factorization
6.3 LULESH
7 Conclusion and Future Work
References
Finepoints: Partitioned Multithreaded MPI Communication
1 Introduction
2 Background
2.1 MPI Multithreaded Communication Models
3 Hybrid-Model Design Requirements
3.1 Finepoints: Partitioned Communication
3.2 Partitioned MPI Communication Interface
3.3 Hardware Support for Partitioned Send
4 Experimental Results
4.1 Experimental Platform
4.2 Microbenchmarks
4.3 Message Aggregation Optimizations
4.4 Application Proxies
5 Related Work
6 Conclusions and Future Work
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

High Performance Computing

Description

More details

Other editions

Additional editions

Content

System requirements