
High Performance Computing
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book constitutes the refereed proceedings of the 34th International Conference on High Performance Computing, ISC High Performance 2019, held in Frankfurt/Main, Germany, in June 2019.
The 17 revised full papers presented were carefully reviewed and selected from 70 submissions. The papers cover a broad range of topics such as next-generation high performance components; exascale systems; extreme-scale applications; HPC and advanced environmental engineering projects; parallel ray tracing - visualization at its best; blockchain technology and cryptocurrency; parallel processing in life science; quantum computers/computing; what's new with cloud computing for HPC; parallel programming models for extreme-scale computing; workflow management; machine learning and big data analytics; and deep learning and HPC.More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Architectures, Networks and Infrastructure
- Evaluating Quality of Service Traffic Classes on the Megafly Network
- 1 Introduction
- 2 Exploring Quality of Service on HPC Networks
- 3 Evaluation Methodology
- 3.1 HPC Simulation Environment
- 3.2 Topology and Routing Description
- 3.3 Network Configurations
- 3.4 Workloads
- 3.5 Rank-to-Node Mappings
- 4 Quantifying Interference on 1-D Dragonfly and Megafly Networks
- 5 Evaluating Quality of Service on Megafly Networks
- 5.1 QoS Mechanism I: Prioritizing Entire Applications
- 5.2 QoS Mechanism II: Prioritizing and Guaranteeing Bandwidth to Latency-Sensitive Operations
- 5.3 Applying QoS Mechanisms to Multiple Application Workloads in Parallel
- 6 Related Work
- 7 Discussion and Conclusion
- References
- Artificial Intelligence and Machine Learning
- Densifying Assumed-Sparse Tensors
- 1 Introduction
- 2 Background
- 3 Issues with Scaling the Transformer Model
- 4 Densifying Assumed-Sparse Tensors
- 5 Experimental Results
- 5.1 Weak Scaling Performance
- 5.2 Strong Scaling
- 5.3 Model Accuracy
- 6 Discussion
- 7 Future Work and Conclusion
- References
- Learning Neural Representations for Predicting GPU Performance
- 1 Introduction
- 2 Background and Motivation
- 2.1 Related Work
- 2.2 Explicit Features
- 2.3 Representation Learning
- 2.4 Collaborative Filtering
- 3 Prediction Model
- 3.1 Multi-layer Perceptron Model
- 3.2 Multiple Training Objectives
- 3.3 Automated Architecture Search
- 4 Experiment Setup
- 4.1 Machine Specification
- 4.2 Benchmarks
- 4.3 Methodology
- 5 Results and Discussions
- 5.1 Performance of Matrix Factorization (R1)
- 5.2 Performance of Multi-layer Perceptron (R2)
- 5.3 Training with Additional Metrics (R3)
- 6 Conclusions
- References
- Data, Storage and Visualization
- SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis
- 1 Introduction
- 2 Preliminaries
- 2.1 Multidimensional Array
- 2.2 User-Defined Function and Programming Model
- 3 SLOPE Programming Model
- 3.1 Abstract Data Type-Stencil
- 3.2 SLOPE Programming Model
- 3.3 Example Data Analysis Using SLOPE
- 4 Parallel Execution Engine
- 4.1 Overview of Parallel Execution Engine
- 4.2 Data Partitioning and Halo Layer
- 4.3 Data and Computing Scheduling
- 4.4 Output Array Dimension
- 4.5 Advanced Features
- 4.6 Implementation of SLOPE
- 5 Evaluation
- 5.1 Evaluation Using Synthetic Data Analysis
- 5.2 Evaluation for SLOPE and Spark Using Real Applications
- 6 Related Work
- 7 Conclusions and Future Work
- References
- A Near-Data Processing Server Architecture and Its Impact on Data Center Applications
- 1 Introduction
- 2 Related Work
- 3 NDP Server Architecture
- 3.1 The Architecture of a Conventional Server
- 3.2 The New NDP Server Architecture
- 4 Implementations
- 4.1 Implementation Methodology
- 4.2 Implementation of SANS and SFNS
- 4.3 Implementation of the Applications
- 5 Evaluation
- 5.1 Evaluation of FFT & LC & HE
- 5.2 Evaluation of the Three k-NN Applications
- 5.3 Impact of NDP on Data Center Applications
- 6 Conclusions
- References
- Comparing the Efficiency of In Situ Visualization Paradigms at Scale
- 1 Introduction
- 2 Related Work
- 3 Experimental Overview
- 4 Results
- 4.1 Time to Solution
- 4.2 Total Cost
- 4.3 Scalability of Visualization Algorithms
- 5 Discussion
- 5.1 In-line and In-transit Cost Models
- 5.2 In-Line and In-Transit Time to Solution
- 6 Conclusion and Future Directions
- References
- Emerging Technologies
- Layout-Aware Embedding for Quantum Annealing Processors
- 1 Introduction
- 2 Background
- 2.1 Quantum Annealing
- 2.2 Minor-Embedding
- 3 Layout-Awareness
- 4 Layout-Aware Embedding Methods
- 4.1 Global Placement
- 4.2 Diffusion-Based Migration
- 4.3 Disperse Router
- 4.4 Combined Approach
- 4.5 Related Work
- 5 Evaluation
- 5.1 Benchmark Problems: Quantum-Dot Cellular Automata
- 5.2 Embedding Results
- 5.3 Sampling Results
- 6 Conclusions
- References
- HPC Algorithms
- Toward Efficient Architecture-Independent Algorithms for Dynamic Programs
- 1 Introduction
- 2 Multi-way Recursive Divide and Conquer
- 2.1 r-way R-DP Design
- 2.2 Additional r-way R-DP Algorithms
- 3 External-Memory GPU Algorithms
- 3.1 GPU Computing Model
- 3.2 Related Work (GPU)
- 3.3 GPU Algorithm Design
- 3.4 I/O Complexities
- 3.5 GPU Experimental Results
- 4 Distributed-Memory Algorithms
- 4.1 Distributed-Memory r-way R-DP
- 4.2 Bandwidth and Latency Lower Bounds
- 4.3 Related Work (Distributed Memory)
- 4.4 Distributed Memory Experimental Results
- 5 Conclusion
- References
- HPC Applications
- Petaflop Seismic Simulations in the Public Cloud
- 1 Introduction and Related Work
- 2 Earthquake Simulations
- 2.1 Fused Forward Simulations
- 2.2 Model Setup
- 2.3 Shared Memory Dynamic Load Balancing
- 3 Cloud Setup
- 4 Benchmarking the Cloud
- 4.1 Floating Point Throughput
- 4.2 Memory
- 4.3 Interconnect
- 4.4 Single-Node Application Performance
- 5 Elastic Scalability
- 6 Discussion and Conclusion
- References
- MaLTESE: Large-Scale Simulation-Driven Machine Learning for Transient Driving Cycles
- 1 Introduction
- 2 Surrogate Modeling for Transient Drive Cycle Simulation
- 2.1 Engine Simulator
- 2.2 ML-Based Surrogate Modeling
- 3 Experimental Results
- 3.1 Setup
- 3.2 Training Data Generation at Scale
- 3.3 Comparison of ML Methods
- 3.4 Impact of Training Set Size
- 3.5 Model Adaptation Using Transfer Learning and Retraining
- 4 Related Work
- 5 Conclusion
- A Appendix
- References
- Performance Modeling and Measurement
- PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems
- 1 Introduction
- 2 Related Work
- 3 Automatic Discovery of Performance Problems
- 3.1 False Sharing
- 3.2 Main Memory Bandwidth
- 4 PerfMemPlus Implementation
- 4.1 Profiling Tool
- 4.2 Viewer
- 5 Evaluation
- 5.1 The PARSEC Benchmarks
- 5.2 Canneal
- 5.3 Streamcluster
- 5.4 Freqmine
- 5.5 Mnist
- 5.6 N3LP
- 5.7 Overhead
- 6 Conclusion and Future Work
- References
- GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications
- 1 Introduction
- 2 Related Work
- 3 Background and Overview
- 3.1 Example of Mixed-Precision Tuning
- 3.2 Configurations
- 3.3 Overview of Our Approach
- 4 Approach
- 4.1 Kernel Intermediate Representation
- 4.2 FISet Design
- 4.3 FISet Illustration
- 4.4 FISet Properties and Algorithm
- 4.5 Shadow Computations
- 4.6 Limitations
- 5 Evaluation
- 5.1 Comparison Approach: Precimonious
- 5.2 CUDA Programs
- 5.3 Overhead of Shadow Computations
- 5.4 Threshold Settings
- 5.5 Case 1: LULESH
- 5.6 Case 2: CoMD
- 5.7 Case 3: CFD
- 6 Conclusions
- References
- Performance Exploration Through Optimistic Static Program Annotations
- 1 Introduction
- 2 Static Program Annotation
- 3 Optimistic Optimization Opportunities
- 3.1 Potentially Overflowing Computations
- 3.2 Potentially Parallel Loops
- 3.3 Control Flow Speculation
- 3.4 Function Behavior
- 3.5 Pointer Attributes
- 3.6 Overlapping and Inconsistent Annotations
- 4 Implementation Details
- 4.1 Granularity of Optimistic Opportunities
- 4.2 Search Space Exploration
- 5 Evaluation
- 5.1 RSBench (A)
- 5.2 XSBench (B)
- 5.3 PathFinder (C)
- 5.4 CoMD (D)
- 5.5 Pennant (E)
- 5.6 MiniGMG (F)
- 5.7 Successfully Verified Annotations
- 5.8 Optimistic Choices
- 5.9 Comparison with Link Time Optimization (LTO)
- 6 Related Work
- 7 Conclusion and Future Work
- References
- Programming Models and Systems Software
- End-to-End Resilience for HPC Applications
- 1 Introduction
- 2 Background
- 3 Assumptions
- 4 End-to-End Resilience
- 5 Implementation Details
- 6 Experimental Results
- 6.1 Matrix Multiplication
- 6.2 TF-IDF
- 6.3 NAS Parallel Benchmarks
- 7 Related Work
- 8 Conclusion
- References
- Resilient Optimistic Termination Detection for the Async-Finish Model
- 1 Introduction
- 2 Background
- 2.1 Nested Task Parallelism Models
- 2.2 The X10 Programming Model
- 3 Related Work
- 4 Message-Optimal Async-Finish Termination Detection
- 5 Async-Finish Termination Detection Under Failure
- 6 Distributed Task Tracking
- 6.1 Finish and LocalFinish Objects
- 6.2 Task Events
- 6.3 Non-resilient Finish Protocol
- 7 Resilient Pessimistic Finish
- 7.1 Adopting Orphan Tasks
- 7.2 Excluding Lost Tasks
- 8 Our Proposed Protocol: Resilient Optimistic Finish
- 8.1 Adopting Orphan Tasks
- 8.2 Excluding Lost Tasks
- 8.3 Optimistic Finish TLA Specification
- 9 Finish Resilient Store Implementations
- 9.1 Reviving the Distributed Finish Store
- 10 Performance Evaluation
- 10.1 Microbenchmarks
- 10.2 LULESH
- 11 Conclusion
- References
- Global Task Data-Dependencies in PGAS Applications
- 1 Introduction
- 2 Background and Motivation
- 3 Related Work
- 4 Global Task Data Dependencies
- 4.1 Creating the Global Task Graph
- 4.2 Executing the Global Task Graph
- 5 Implementation
- 5.1 Example Code
- 5.2 Range-Based Task and Dependency Creation
- 6 Experimental Evaluation
- 6.1 Micro-benchmarks
- 6.2 Blocked Cholesky Factorization
- 6.3 LULESH
- 7 Conclusion and Future Work
- References
- Finepoints: Partitioned Multithreaded MPI Communication
- 1 Introduction
- 2 Background
- 2.1 MPI Multithreaded Communication Models
- 3 Hybrid-Model Design Requirements
- 3.1 Finepoints: Partitioned Communication
- 3.2 Partitioned MPI Communication Interface
- 3.3 Hardware Support for Partitioned Send
- 4 Experimental Results
- 4.1 Experimental Platform
- 4.2 Microbenchmarks
- 4.3 Message Aggregation Optimizations
- 4.4 Application Proxies
- 5 Related Work
- 6 Conclusions and Future Work
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.