
High Performance Computing
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 53 full papers included in this volume were carefully reviewed and selected from 80 submissions. They cover all aspects of research, development, and application of large-scale, high performance experimental and commercial systems. Topics include HPC computer architecture and hardware; programming models, system software, and applications; solutions for heterogeneity, reliability, power efficiency of systems; virtualization and containerized environments; big data and cloud computing; and artificial intelligence.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- HPC I/O in the Data Center Workshop (HPC-IODC 2018)
- 1 Introduction
- 2 Organization of the Workshop
- 2.1 Program Committee
- 3 Workshop Summary
- 3.1 Research Papers
- 3.2 Talks from Experts
- 3.3 Discussion Sessions
- References
- Analyzing the I/O Scalability of a Parallel Particle-in-Cell Code
- 1 Introduction
- 2 Characterization of the I/O System
- 2.1 Throughput Evaluation as a Function of Request Sizes
- 2.2 Throughput Evaluation as a Function of the Number of Nodes
- 3 Analyzing the Application's I/O Scalability
- 3.1 I/O Pattern Analysis
- 3.2 Evaluation of the Weight of I/O Operations
- 3.3 Evaluation of I/O Strategies
- 4 Experimental Evaluation
- 5 Conclusions
- References
- Cost and Performance Modeling for Earth System Data Management and Beyond
- 1 Introduction
- 1.1 Data Growth and Access Requirements
- 1.2 Existing and Emerging Technologies
- 1.3 Addressing Domain Scientists and Their Workflows
- 2 Related Work
- 3 Cost Modeling
- 4 Coarse Grained Model
- 4.1 Resilience Model
- 4.2 Performance Model
- 5 Model Considerations for Common Subcomponents
- 5.1 Compute Nodes
- 5.2 I/O Nodes
- 6 Cost Study for Alternative Deployments
- 7 Application in Cost-Aware I/O Middleware
- 8 Summary
- References
- I/O Interference Alleviation on Parallel File Systems Using Server-Side QoS-Based Load-Balancing
- 1 Introduction
- 2 Research Background
- 2.1 K Computer and Its File Systems
- 2.2 Performance Problems of File I/O on the K Computer
- 2.3 QoS-Based Management at an MDS
- 3 Investigation of Internal File Server Activities
- 4 Performance Evaluation
- 4.1 MDS Response Evaluation Using MDTEST
- 4.2 QoS Impact in Fair-Share Execution Among Concurrent Running Jobs
- 4.3 QoS Impact in Data-Staging
- 5 Related Work
- 6 Concluding Remarks
- References
- Tools for Analyzing Parallel I/O
- 1 Introduction
- 2 Introduction to Performance Analysis
- 2.1 Closed Loop of Performance Tuning
- 2.2 Measurement
- 2.3 Preparation of Applications
- 2.4 Analysis of Data
- 3 Tools
- 3.1 Darshan
- 3.2 Vampir
- 3.3 Mistral/Breeze
- 3.4 SIOX
- 3.5 PIOM-MP
- 3.6 Additional User-Level Tools
- 3.7 Further Administrative Tools
- 3.8 Tools for Unifying Trace Formats
- 4 Example Studies
- 4.1 I/O Performance Analysis at the Application Level
- 4.2 Online Monitoring
- 4.3 Online Monitoring with LLview
- 5 Challenges in Analyzing I/O
- 6 Conclusions
- References
- Workshop on Performance and Scalability of Storage Systems (WOPSSS 2018)
- Understanding Metadata Latency with MDWorkbench
- 1 Introduction
- 2 Related Work
- 3 MDWorkbench
- 4 Experimental Setup
- 5 Results
- 5.1 Impact of Concurrent Execution of Several Metadata Operations
- 5.2 Overview of Results for the Benchmark Phase
- 5.3 Understanding Latencies
- 6 Conclusions
- References
- From Application to Disk: Tracing I/O Through the Big Data Stack
- 1 Introduction
- 2 Big Data Software Stack
- 3 Methodology
- 3.1 Status Quo
- 3.2 Statistics File System
- 4 A Case Study: TeraSort
- 4.1 Setup
- 4.2 Vanilla Hadoop Results
- 4.3 SFS Insights
- 4.4 Optimized Hadoop Results
- 5 SFS Overhead
- 6 Discussion and Limitations
- 7 Related Work
- 8 Future Work
- 9 Conclusions
- References
- IOscope: A Flexible I/O Tracer for Workloads' I/O Pattern Characterization
- 1 Introduction
- 2 IOscope Design and Validation
- 2.1 Foundation: eBPF
- 2.2 IOscope Design
- 2.3 IOscope Validation
- 3 Experiments
- 3.1 Setup, Datasets, and Scenarios
- 3.2 MongoDB Experiments
- 3.3 Cassandra Experiments
- 4 Related Work
- 5 Conclusions
- References
- Exploring Scientific Application Performance Using Large Scale Object Storage
- 1 Introduction
- 2 Background and Related Work
- 3 Emulating Scientific Applications Using Object Storage
- 3.1 Emulator Implementation
- 4 Experimental Environment
- 5 Evaluation
- 6 Conclusion
- References
- Benefit of DDN's IME-FUSE for I/O Intensive HPC Applications
- 1 Introduction
- 2 Related Work
- 3 Test Environment
- 3.1 Benchmarks
- 4 Experiment Configuration
- 4.1 Open/Close Times
- 4.2 Performance
- 5 Evaluation
- 5.1 Application Kernel Using HDF5
- 5.2 Performance Variability with Individual I/Os
- 6 Conclusion
- References
- Performance Study of Non-volatile Memories on a High-End Supercomputer
- 1 Introduction
- 2 Related Work
- 3 Methodology and Technical Specifications
- 3.1 The Device Specifications
- 3.2 Experimental Methodology
- 3.3 Benchmarking
- 4 Evaluation
- 4.1 Transfer Size Impact
- 4.2 Weak Scaling
- 4.3 File Size Impact
- 5 Conclusion
- References
- Self-optimization Strategy for IO Accelerator Parameterization
- 1 Introduction
- 2 Self-optimization Strategy
- 3 Inference of the Accelerator Parameters
- 3.1 Regression of the Objective Function
- 3.2 Search for the Optimal Parameterization
- 4 Experiments and Results
- 5 Conclusion
- References
- 13th Workshop on Virtualization in High-Performance Cloud Computing (VHPC 2018)
- utmem: Towards Memory Elasticity in Cloud Workloads
- 1 Introduction
- 2 Background
- 3 Overview
- 3.1 Design
- 3.2 Implementation
- 4 Evaluation
- 4.1 Evaluation of utmem Under Memory Pressure
- 4.2 Evaluation of utmem Under Nonexistent Memory Pressure
- 5 Related Work
- 6 Conclusion and Future Work
- 6.1 Future Work
- 6.2 Conclusion
- References
- Efficient Live Migration of Linux Containers
- 1 Introduction
- 2 Live Migration with CRIU
- 2.1 Pre-copy Migration with CRIU
- 2.2 Post-copy Migration with CRIU
- 2.3 Automatic Transfer of Image Files
- 2.4 Design of Post-copy Memory Migration
- 2.5 Combining Pre-copy and Post-copy Migration
- 3 Using Image Cache and Image Proxy for Container Live Migration
- 4 Evaluation
- 5 Discussion and Future Work
- 6 Conclusion
- References
- Third International Workshop on In Situ Visualization: Introduction and Applications (WOIV 2018)
- Introduction
- Organization of the Workshop
- 2.1 Organizing Committee
- 2.2 Program Committee
- Workshop Summary
- 3.1 Invited Talks
- 3.2 Research Papers
- Coupling the Uintah Framework and the VisIt Toolkit for Parallel In Situ Data Analysis and Visualization and Computational Steering
- 1 Introduction
- 2 Background
- 2.1 In Situ Data Analysis and Visualization
- 2.2 Utilization of Diagnostic Data
- 3 Methods
- 3.1 Per-Rank Runtime Performance Data
- 3.2 The Simulation Dashboard
- 4 Results
- 5 Conclusion
- References
- Binning Based Data Reduction for Vector Field Data of a Particle-In-Cell Fusion Simulation
- 1 Introduction
- 2 Related Work
- 2.1 In Situ Visualization
- 2.2 XGC1 Visualization
- 2.3 Large-Scale Particle Visualization
- 3 Binning of Fusion Data
- 3.1 Generating the Binned Data
- 4 Experimental Overview
- 4.1 Workflow Description
- 4.2 Evaluating Accuracy
- 5 Results
- 5.1 Test Result Summary
- 5.2 Poincaré Test Results
- 5.3 Streamline/Pathline Test Results
- 6 Conclusions and Future Work
- References
- In Situ Analysis and Visualization of Fusion Simulations: Lessons Learned
- 1 Introduction
- 2 Related Work
- 3 Motivation
- 4 Setup
- 4.1 Campaign 1
- 4.2 Campaign 2
- 5 Results
- 5.1 Campaign 1
- 5.2 Campaign 2
- 6 Discussion
- 7 Conclusion and Future Works
- References
- Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets
- 1 Introduction
- 2 Temporal Buffer
- 2.1 Description
- 2.2 Buffer Operations
- 2.3 Code Integration
- 2.4 Update Parameters and Steering
- 3 Computational Environment and Implementation
- 3.1 Computational Environment
- 3.2 Software System
- 3.3 Use In Situ and in Transit Scenarios
- 4 Evaluation and Discussion
- 4.1 Target Application
- 4.2 Considerations on the Number of Time Steps to Hold
- 4.3 Target Processing Examples
- 4.4 Data Transfer Performance (In Transit Scenario)
- 5 Conclusion
- References
- Streaming Live Neuronal Simulation Data into Visualization and Analysis
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 NESCI - Neuronal Simulator Conduit Interface
- 3.2 CONTRA - Conduit Transport
- 4 Application
- 4.1 NEST Simulation
- 4.2 2D Visualization
- 4.3 3D Visualization
- 5 Performance Evaluation
- 6 Conclusion and Future Work
- References
- Enabling Explorative Visualization with Full Temporal Resolution via In Situ Calculation of Temporal Intervals
- 1 Introduction
- 2 Related Work
- 2.1 Individual Time Slice Data
- 2.2 Multiple Time Slice Data
- 2.3 Complete Temporal Data
- 2.4 Impact of Error in Compression
- 2.5 How Our Approach Differs from Previous Work
- 3 Algorithm
- 3.1 Error Bound
- 3.2 Compression Approaches
- 3.3 Memory Requirements
- 3.4 Reconstruction
- 4 Evaluation
- 4.1 Experiment Configuration
- 4.2 Phase Overview and Measurements
- 4.3 Hardware
- 4.4 Software
- 5 Results
- 5.1 Phase One: GHOST, LULESH, XGC1 Particle Ions, and Tornado
- 5.2 Phase Two: Comparison with Wavelets and SZ
- 5.3 Phase Three: In Situ Experimentation
- 6 Conclusion
- 7 Future Work
- References
- In-Situ Visualization of Solver Residual Fields
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Solvers and Residual Fields
- 3.2 Aggregated Residual Fields
- 3.3 Residual Curves
- 3.4 Residual Stacks
- 3.5 In-Situ Application
- 4 Results
- 4.1 Implementation and Timings
- 4.2 Kármán Runs
- 4.3 Mesh Resolution Experiment
- 4.4 Grid Refinement Experiment
- 5 Conclusion
- References
- An In-Situ Visualization Approach for the K Computer Using Mesa 3D and KVS
- 1 Introduction
- 2 Related Work
- 3 Mesa 3D on the K Computer
- 4 OpenGL-Based KVS Library
- 4.1 Particle Based Volume Rendering
- 4.2 Traditional Rendering Methods
- 5 Experimental Results
- 6 Conclusions
- References
- 4th International Workshop on Communication Architectures for HPC, Big Data, Deep Learning and Clouds at Extreme Scale (ExaComm 2018)
- Introduction
- Organization
- 2.1 Program Committee
- Workshop Summary
- 3.1 Invited Talks
- 3.2 Research Papers
- 3.3 Panel Discussion
- Comparing Controlflow and Dataflow for Tensor Calculus: Speed, Power, Complexity, and MTBF
- 1 Introduction to Ultimate Dataflow
- 2 Introduction to Tensor Calculus
- 3 Existing Solutions
- 3.1 An Overview of Tensor Operations
- 3.2 An Overview of Underlying Hardware
- 4 The Dataflow Approach
- 5 Tensor Operations on the Dataflow Architecture
- 5.1 Tensor Addition
- 5.2 Tensor Transpose
- 5.3 Tensor Composition
- 5.4 Tensor Inverse
- 5.5 Primary and Principal Invariants
- 5.6 Eigenvalues and Eigenvectors
- 5.7 Spectral Decomposition
- 5.8 Divergence of a Tensor Field
- 5.9 Tensor Rank
- 6 Performance Evaluation
- 6.1 Speedup
- 6.2 Power Dissipation
- 6.3 Complexity
- 6.4 Mean Time Between Failures
- 7 Conclusion
- References
- Supercomputer in a Laptop: Distributed Application and Runtime Development via Architecture Simulation
- 1 Introduction
- 2 Prior Work
- 3 Simulator Implementation
- 3.1 Encapsulation
- 3.2 Interception and uGNI Bindings
- 3.3 Skeletonization
- 3.4 Overhead Pragma
- 4 Example Results
- 4.1 Methodology
- 4.2 GASNet Benchmark
- 4.3 Scaling of Skeletonized Runtime
- 5 Additional Features and Future Work
- 5.1 Deterministic Debugging of Distributed Race Conditions
- 5.2 Deterministic, Controlled Environment for Performance Comparisons
- 5.3 Host Compute Overhead Estimates
- 5.4 Valgrind and GDB
- 5.5 Extending to OpenMPI and Infiniband
- 6 Conclusion
- References
- International Workshop on OpenPOWER for HPC 2018 (IWOPH 2018)
- References
- CGYRO Performance on Power9 CPUs and Volta GPUs
- 1 Introduction
- 1.1 CGYRO: A Multiscale-Optimized Fusion Plasma Solver
- 1.2 Porting CGYRO to GPUs
- 1.3 Simulations Suitable for Benchmarking
- 2 Compiling and Running CGYRO on Power9
- 2.1 Porting CGYRO to Power9
- 2.2 Evaluating the Effect of Hyperthreading
- 2.3 The Impact of Volta GPUs on CGYRO Performance
- 3 Comparing to Other Systems
- 3.1 CPU-Only Tests
- 3.2 Full Node Tests
- 4 Summary
- References
- A 64-GB Sort at 28 GB/s on a 4-GPU POWER9 Node for Uniformly-Distributed 16-Byte Records with 8-Byte Keys
- Abstract
- 1 Introduction
- 2 System Attributes and Upper Bounds
- 3 Sorting Algorithm
- 3.1 Partitioner Design
- 3.2 Design of the Shuffle Phase
- 3.3 Sorting a Single Partition
- 3.4 Single-Node Sort
- 3.5 Multi-node Sort
- 4 Sort Performance
- 4.1 Single-Node Sort Performance
- 4.2 Multi-node Sort Performance
- 5 Future Work
- 6 Summary and Conclusions
- Acknowledgements
- References
- Early Experience on Running OpenStaPLE on DAVIDE
- 1 Introduction
- 2 OpenStaPLE
- 3 The DAVIDE Cluster
- 4 Performance Analysis of OpenStaPLE
- 4.1 Benchmarking of Interconnects
- 4.2 Energy Performance
- 5 Conclusions and Future Prospects
- References
- Porting and Benchmarking of BWAKIT Pipeline on OpenPOWER Architecture
- Abstract
- 1 Introduction
- 2 BWAKIT Pipeline Implementation
- 3 Experimental Benchmarking Setup
- 4 Benchmarking Methodology
- 5 Performance Metrics Used for Benchmarking
- 6 Validation of BWAKIT Results
- 7 Conclusion
- Acknowledgement
- References
- Improving Performance and Energy Efficiency on OpenPower Systems Using Scalable Hardware-Software Co-design
- 1 Introduction
- 2 GEOPM on OpenPower
- 2.1 GEOPM Overview
- 2.2 Measuring Power and Performance
- 2.3 Port
- 3 Preliminary Results
- 3.1 Experimental Setup
- 3.2 Applications Profiles
- 4 Conclusions and Future Work
- References
- Porting DMRG++ Scientific Application to OpenPOWER
- 1 Introduction
- 2 Motivation
- 3 Density Matrix Renormalization Group
- 3.1 The Application
- 3.2 Baseline Performance Characteristics of the Application
- 3.3 Hamiltonian Matrix
- 3.4 Pseudo Code: Apply Hamiltonian Target
- 3.5 Types of Available Parallelism in the Kronecker Product Algorithm
- 4 Problem Statement
- 5 Implementation and Experimental Evaluations
- 5.1 Experimental Setup
- 5.2 Pseudo Codes for Evaluation
- 5.3 Evaluation
- References
- Job Management with mpi_jm
- 1 Introduction
- 2 mpi_jm
- 2.1 Masters
- 2.2 The Scheduler
- 2.3 Workers
- 2.4 Issues and Dependencies
- 3 Individual Tasks and the Python Interface
- 4 Initial Performance
- References
- Compile-Time Library Call Detection Using CAASCADE and XALT
- 1 Introduction
- 2 CAASCADE Overview
- 3 Library Detection
- 3.1 Compiler Plugins
- 3.2 Classification of the Libraries Calls
- 3.3 Call Graph Analysis for Library Detection
- 4 Experiments and Results
- 5 Related Work
- 6 Future Work
- 6.1 Inter-procedural Pointer Analysis
- 6.2 Linkage Information
- References
- NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems
- 1 Introduction
- 2 Explicit Data Transfer
- 3 Unified Memory
- 3.1 Page Fault Latency
- 4 Conclusion
- References
- IXPUG Workshop: Many-Core Computing on Intel Processors: Applications, Performance and Best-Practice Solutions
- IXPUG in an Evolving World - The New IXPUG
- 1.1 What You Should Know About IXPUG
- 1.2 Working Groups and Discussion Forum
- 1.3 The IXPUG Steering Committee
- Workshop Overview
- Call for Papers
- Workshop Agenda
- Program Committee
- Workshop Organizers
- Reference
- Sparse CSB_Coo Matrix-Vector and Matrix-Matrix Performance on Intel Xeon Architectures
- 1 Introduction
- 2 System Architecture
- 3 Methodology
- 3.1 CSB_Coo
- 4 SPMM
- 4.1 Vectorization
- 5 SPMV
- 5.1 Thread Scaling
- 5.2 AVX-512 CD Instructions
- 5.3 Manually Removing the Conflicts
- 6 Conclusions
- References
- Lessons Learned from Optimizing Kernels for Adaptive Aggregation Multi-grid Solvers in Lattice QCD
- 1 Introduction
- 2 Restrictor Definition and Implementation
- 2.1 No Inner Parallelism or Parallelism Using Atomics
- 2.2 Explicit OpenMP Nested Parallelism
- 2.3 OpenMP Custom Reductions
- 2.4 Manual Fake-Out of Nested Parallelism
- 3 Performance Results
- 3.1 Experimental Setup
- 3.2 Performance Results
- 4 Conclusions and Outlook
- References
- Distributed Training of Generative Adversarial Networks for Fast Detector Simulation
- 1 Introduction
- 2 Previous Work
- 3 Three-Dimensional GANs for Calorimeter Simulation
- 3.1 Calorimeter Data
- 3.2 Networks Architecture
- 3.3 Physics Validation
- 3.4 Computing Performance and Training Time
- 4 Distributed Training
- 4.1 Distributing the Training of the 3DGAN with Keras/Tensorflow and Horovod
- 4.2 Execution Environment
- 4.3 Scaling Results
- 4.4 Validation at Scale
- 5 Conclusions and Future Goals
- References
- Workshop on Sustainable Ultrascale Computing Systems
- Introduction
- Workshop Program Committee
- Workshop Chair
- Program Committee
- Workshop Summary
- Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs
- 1 Introduction
- 2 Background
- 2.1 CARM: Cache-Aware Roofline Model
- 2.2 Reconstruction Algorithms in Medical Imaging
- 3 Characterization and Profiling Method
- 3.1 CARM-Based Profiling Tool for GPU Applications
- 3.2 Kernels for Medical Image Processing in CT
- 4 Experimental Results
- 4.1 High-End GPU Evaluation
- 4.2 Commodity GPU Evaluation
- 5 Related Work
- 6 Conclusions
- References
- How Pre-multicore Methods and Algorithms Perform in Multicore Era
- 1 Introduction
- 2 How Much Performance and Energy You Can Lose Through Load Balancing on Multicore Platforms
- 2.1 When Does Load Balancing Work?
- 2.2 When Does Load Balancing Not Work?
- 2.3 New Methods and Algorithms for Performance and Energy Optimization on Multicore-Based Platforms
- 3 PMC-Based Power and Energy Modelling in Multicore Era
- 4 Conclusion
- References
- Approximate and Transprecision Computing on Emerging Technologies (ATCET 2018)
- Preface
- Sec2
- Impact of Approximate Memory Data Allocation on a H.264 Software Video Encoder
- 1 Introduction
- 1.1 Approximate Memory
- 2 OS Managed Approximate Memory and AppropinQuo Emulator
- 3 H.264 Video Encoding
- 3.1 Approximate Memory Data Allocation for the x264 Encoder
- 4 Results
- 4.1 Output with Approximate Memory and Energy Saving Considerations
- 5 Conclusion
- References
- Residual Replacement in Mixed-Precision Iterative Refinement for Sparse Linear Systems
- 1 Introduction
- 2 Residual Replacement for Krylov Methods
- 2.1 Preconditioned Conjugate Gradient (PCG)
- 2.2 Residual Replacement
- 3 Evaluation
- 3.1 Cost Model
- 3.2 Cost Analysis
- 4 Concluding Remarks
- References
- Training Deep Neural Networks with Low Precision Input Data: A Hurricane Prediction Case Study
- 1 Introduction
- 2 Background and Related Work
- 3 Hurricane Prediction Case Study
- 3.1 Deep Learning for Hurricane Prediction
- 3.2 Reduced Input Data Precision
- 4 Results and Discussion
- 5 Conclusion and Future Work
- References
- A Transparent View on Approximate Computing Methods for Tuning Applications
- 1 Introduction
- 2 Exploit Performance Profiles as Transparent View on Approximate Computing Methods
- 3 How to Consider Multiple Objectives?
- 4 Taking Conventional Methods into Account
- 5 Exploiting PPs for System Tuning
- 6 Conclusion
- References
- Exploring the Effects of Code Optimizations on CPU Frequency Margins
- 1 Introduction
- 2 Methodology
- 3 Compiler Optimizations Analysis
- 4 Source Code Transformations
- 4.1 Memory Access Pattern Optimizations
- 4.2 SIMD Optimizations
- 5 Related Work
- 6 Conclusions
- References
- First Workshop on the Convergence of Large-Scale Simulation and Artificial Intelligence
- Taking Gradients Through Experiments: LSTMs and Memory Proximal Policy Optimization for Black-Box Quantum Control
- 1 Introduction
- 2 Quantum Control
- 3 Reinforcement Learning: Why and What?
- 4 The Learning Algorithm
- 5 Applying the Method
- 5.1 Quantum Memory
- 5.2 Ground State Transitions
- 6 Results
- 6.1 Quantum Memory
- 6.2 Ground State Transition
- 7 Conclusion and Future Work
- References
- Towards Prediction of Turbulent Flows at High Reynolds Numbers Using High Performance Computing Data and Deep Learning
- 1 Introduction
- 2 Deep Learning and Turbulence
- 3 DNS Data Base
- 4 Results
- 5 Conclusions
- References
- Third Workshop for Open Source Supercomputing (OpenSuCo 2018)
- Using a Graph Visualization Tool for Parallel Program Dynamic Visualization and Communication Analysis
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Graph Building
- 3.2 Data Collection
- 3.3 Graph Textual Representation
- 3.4 Graph Visualization and Analysis
- 4 Case Study: NAS Parallel Benchmark
- 4.1 Algorithm Topology
- 4.2 Dynamic Communication Behavior
- 5 Conclusion
- References
- Offloading C++17 Parallel STL on System Shared Virtual Memory Platforms
- 1 Introduction
- 2 Related Work
- 3 Heterogeneous Offloading of Parallel STL
- 4 System Shared Virtual Memory and C++17
- 5 Proof of Concept Implementation
- 5.1 Binary Exchange Format
- 5.2 Indirect Calls and IL Specialization
- 6 Evaluation
- 7 Conclusions
- References
- First Workshop on Interactive High-Performance Computing
- Introduction
- Organization of the Workshop
- 2.1 Program Committee
- 2.2 Summary of the Submissions
- Workshop Summary
- Lessons Learned from a Decade of Providing Interactive, On-Demand High Performance Computing to Scientists and Engineers
- 1 Introduction
- 2 Lessons Learned
- 2.1 Broadening the Definition of Interactive HPC
- 2.2 Re-architecting for Interactive HPC
- 2.3 Reframing the Metrics of Success
- 2.4 Expanding the HPC Ecosystem
- 3 Architecture Requirements for Interactive HPC
- 3.1 System
- 3.2 Software
- 3.3 Supporting Users
- 4 Metrics
- 5 Summary and Future Work
- References
- Enabling Interactive Supercomputing at JSC Lessons Learned
- 1 Introduction
- 2 Background Jupyter
- 3 Jupyter Integration at JSC
- 4 Use Case: Rhinodiagnost
- 5 Use Case: Deep Learning
- 6 Lessons Learned
- 7 Outlook
- 8 Conclusion
- References
- Interactive Distributed Deep Learning with Jupyter Notebooks
- 1 Introduction
- 2 System Architecture
- 3 Distributed Training
- 4 Distributed Hyper-parameter Optimization
- 4.1 Random Search HPO Notebook
- 4.2 HPO with Interactive Widgets
- 4.3 Advanced HPO
- 5 Conclusions
- 6 Code and Recipes
- References
- Third International Workshop on Performance Portable Programming Models for Accelerators (P^3MA 2018)
- 1Workshop Summary
- Part13
- 3Steering Committee
- 4Program Chairs
- 5Steering Committee
- Performance Portability of Earth System Models with User-Controlled GGDML Code Translation
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 The General Approach
- 3.2 The User-Controlled Source-to-Source Code Translation
- 4 GGDML Review
- 5 Machine-Specific Configuration
- 5.1 Grid Configuration
- 5.2 Configurable Access Operators
- 5.3 Memory Layout
- 5.4 Parallelization
- 6 Evaluation
- 6.1 Test Application
- 6.2 Test System
- 6.3 Results
- 7 Summary
- 7.1 Future Work
- References
- Evaluating Performance Portability of Accelerator Programming Models using SPEC ACCEL 1.2 Benchmarks
- 1 Introduction
- 2 Motivation
- 3 The SPEC ACCEL Benchmark Suite
- 4 SPEC ACCEL 1.2 Results
- 4.1 Experimental Systems
- 4.2 Performance
- 4.3 Correctness and Functionality
- 4.4 OpenMP and OpenACC Performance Comparison
- 5 Related Work
- 6 Conclusion
- References
- A Beginner's Guide to Estimating and Improving Performance Portability
- 1 Introduction
- 2 Related Work
- 3 The Performance Portability Definition and Metric
- 4 Experimental Setup
- 4.1 The OpenACC Applications
- 4.2 The Platforms
- 5 Computing and Interpreting PPM
- 5.1 The PPM Calculation Workflow
- 5.2 Calculating Performance Efficiency
- 5.3 Case-Studies: PPM Results and Analysis
- 6 Improving Performance Portability
- 6.1 Techniques for Performance Portability Improvement
- 6.2 Case-Studies: Improving Performance Portability
- 7 Conclusion and Future Work
- References
- Profiling and Debugging Support for the Kokkos Programming Model
- 1 Introduction
- 2 Kokkos Profiling Tools
- 2.1 Overview of the Kokkos Profiling Interface
- 2.2 Event Callbacks from Kokkos
- 3 Tools for Profiling Kokkos Applications
- 3.1 Kernel Profiling
- 3.2 Parallel Time Stack Profiling
- 3.3 Memory Event/Heap Profiling
- 4 Conclusions
- 5 Related Work
- 6 Tool Availability
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.