
High Performance Computing for Computational Science - VECPAR 2018
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book constitutes the thoroughly refereed post-conference proceedings of the 13th International Conference on High Performance Computing in Computational Science, VECPAR 2018, held in São Pedro, Brazil, in September 2018.
The 17 full papers and one short paper included in this book were carefully reviewed and selected from 32 submissions presented at the conference. The papers cover the following topics: heterogeneous systems, shared memory systems and GPUs, and techniques including domain decomposition, scheduling and load balancing, with a strong focus on computational science applications.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Regular Papers
- Communication-Free Parallel Mesh Multiplication for Large Scale Simulations
- 1 Introduction
- 2 Mesh Multiplication Method
- 2.1 Communication Map
- 2.2 Assigning Processes to New Communication Nodes
- 2.3 Boundary Smoothing
- 3 Test Case
- 3.1 YF-17 Mesh Multiplication
- 4 Conclusions
- References
- Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer Algorithms
- 1 Introduction
- 2 Background and Related Works
- 2.1 N-Queens and ATSP
- 2.2 CUDA Dynamic Parallelism Programming Model
- 2.3 GPU-Accelerated Backtracking
- 3 The Proposed Algorithm
- 3.1 Memory Requirement Analysis
- 3.2 Launching the First Kernel Generation
- 4 Performance Evaluation
- 4.1 Experimental Protocol and Parameters Settings
- 4.2 Results
- 4.3 Discussion
- 5 Conclusion and Future Works
- References
- Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources
- 1 Introduction
- 2 Basic Concepts
- 2.1 Computational Fluid Dynamics (CFD)
- 2.2 Task-Based Parallel Programming Paradigm
- 3 Related Work and Motivation
- 4 Design of a Parallel Heterogeneous CFD Application
- 4.1 Serial CFD Application Design and Implementation
- 4.2 Task-Based Heterogeneous Parallel Version
- 5 Performance Analysis
- 5.1 Experimental Design
- 5.2 Speedup Analysis: CPU, CPU/GPU, and Ghost Cells
- 5.3 Block Size Comparison and Application Iteration Overlapping
- 5.4 Behavior Characterization of the StarPU GPU Version
- 6 Conclusion
- References
- Optimizing Packed String Matching on AVX2 Platform
- 1 Introduction
- 2 Notions and Basics
- 3 Algorithms
- 3.1 EPSMA
- 3.2 SSEFA
- 3.3 Cache Optimization
- 4 Implementation and Experimental Results
- 5 Conclusions
- References
- A GPU-Based Metaheuristic for Workflow Scheduling on Clouds
- 1 Introduction
- 2 Problem Definition
- 3 The Sequential Hybrid Evolutionary Algorithm
- 3.1 Identifying Hotspots
- 4 Proposed GPU-Based Local Searches
- 4.1 Move-Element
- 4.2 2-opt move-element
- 4.3 Swap-Vm
- 4.4 4-opt move-element
- 5 Experimental Results
- 6 Conclusion Remarks
- References
- A Systematic Mapping on High-Performance Computing for Protein Structure Prediction
- 1 Introduction
- 2 Protein Structure Prediction Problem
- 3 Research Method
- 3.1 Research Questions
- 3.2 Study Selection
- 3.3 Search String
- 3.4 Sources
- 3.5 Inclusion and Exclusion Criteria
- 3.6 Data Extraction and Synthesis
- 3.7 Classification Scheme
- 4 Results
- 5 Discussion
- 5.1 Future Research Directions
- 6 Conclusion
- References
- Performance Evaluation of Deep Learning Frameworks over Different Architectures
- 1 Introduction
- 2 Related Work
- 3 Deep Learning Frameworks
- 3.1 Caffe
- 3.2 TensorFlow
- 4 Methodology
- 5 Experimental Results
- 5.1 Average Iteration Time by Batch Size
- 5.2 Average per Image Time by Batch Size
- 5.3 Caffe Loss Calculation Problem
- 6 Conclusion
- References
- Non-uniform Domain Decomposition for Heterogeneous Accelerated Processing Units
- 1 Introduction
- 2 Related Work
- 3 Lattice-Boltzmann Method
- 4 The LBM OpenCL Implementation
- 5 Experimental Results
- 5.1 Platform and Environment
- 5.2 Performance Evaluation
- 6 Conclusion
- References
- Performance Evaluation of Two Load Balancing Algorithms for Hybrid Clusters
- 1 Introduction
- 2 Related Work
- 3 The HIS Simulator
- 4 Static LB Algorithm
- 5 Performance Evaluation
- 5.1 Results
- 5.2 How the Dataset and the CPU Optimizations Impact Performance
- 6 Conclusion and Future Work
- References
- An Improved OpenMP Implementation of the TVD-Hopmoc Method Based on a Cluster of Points
- 1 Introduction
- 2 The TVD-Hopmoc Method
- 3 A Basic OpenMP Implementation of the TVD-Hopmoc Method
- 4 An Improved OpenMP Implementation of the TVD-Hopmoc Method Based on a Cluster of Points
- 5 Experimental Results
- 6 Conclusions and Future Directions
- References
- A Scheduling Theory Framework for GPU Tasks Efficient Execution
- 1 Introduction
- 2 Scheduling Theory Applied to GPU Tasks Execution
- 3 Flow Shop Scheduling
- 3.1 Slope Index
- 3.2 NEH Heuristic
- 3.3 Single Queue
- 3.4 NEH Heuristic with a GPU Tasks Execution Model
- 4 Experiments
- 4.1 Statistical Analysis
- 4.2 Applicability and Scalability
- 5 Conclusions
- References
- A Timer-Augmented Cost Function for Load Balanced DSMC
- 1 Introduction
- 2 DSMC
- 3 Load Balancing
- 3.1 State of the Art
- 4 Timer-Augmented Cost Function
- 5 Method
- 5.1 Test Case
- 6 Results
- 7 Conclusion
- References
- Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP
- 1 Introduction
- 2 OpenMP Offloading Performance
- 3 PlasCom2
- 4 The HybridOMP Library
- 4.1 Work Partitioning
- 4.2 Data Movement
- 4.3 Code Execution
- 4.4 Implementation
- 5 Experimental Methodology
- 5.1 Hardware Environment
- 5.2 Software Environment
- 5.3 PlasCom2 Configuration
- 5.4 Execution
- 6 Results
- 6.1 Single-Node Experiments
- 6.2 Multi-node Experiments
- 7 Related Work
- 8 Conclusions and Future Work
- References
- A New Parallel Benchmark for Performance Evaluation and Energy Consumption
- 1 Introduction
- 2 Parallel Programming Interfaces
- 3 Related Work
- 3.1 Similar Benchmarks
- 3.2 Comparison Among the Benchmarks
- 4 Benchmark Applications
- 4.1 Parallelizing the Applications
- 4.2 Applications History
- 5 Results
- 5.1 Methodology
- 5.2 Complexity
- 5.3 Performance and Energy Consumption
- 6 Conclusions and Future Work
- References
- Bigger Buffer k-d Trees on Multi-Many-Core Systems
- 1 Motivation
- 2 Background
- 2.1 Massively-Parallel Nearest Neighbor Computations
- 2.2 Nearest Neighbor Search via k-d Trees
- 2.3 Revisited: Buffer k-d Trees
- 3 Processing Bigger Trees
- 3.1 Construction Phase
- 3.2 Query Phase
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Modified Workflow
- 4.3 Large-Scale Applications
- 5 Conclusions and Outlook
- References
- A Parallel Generator of Non-Hermitian Matrices Computed from Given Spectra
- 1 Introduction
- 2 Related Work
- 3 Matrix Generation Algorithm
- 3.1 Matrix Generation Method
- 3.2 Algorithm
- 4 Parallel Implementation
- 4.1 Basic Implementation on CPUs
- 4.2 Implementation on Mutil-CPU
- 4.3 Specific Optimized Communication Implementation on CPUs
- 5 Performance Evaluation
- 5.1 Hardware Environment
- 5.2 Strong and Weak Scalability Results and Analysis
- 5.3 Speedup Results and Analysis
- 6 Accuracy Verification
- 6.1 Verification Method
- 6.2 Experimental Results
- 6.3 Arithmetic Precision Analysis
- 7 Conclusion and Perspectives
- References
- LRMalloc: A Modern and Competitive Lock-Free Dynamic Memory Allocator
- 1 Introduction
- 2 Background and Related Work
- 3 LRMalloc
- 3.1 High-Level Overview
- 3.2 Heap
- 3.3 Pagemap
- 4 Experimental Results
- 5 Conclusions and Further Work
- References
- Short Paper
- Towards a Strategy for Performance Prediction on Heterogeneous Architectures
- Abstract
- 1 Introduction
- 2 Concepts and Problem Definition
- 2.1 Concepts
- 2.2 Problem Definition
- 3 Related Work
- 4 A Strategy for Performance Prediction
- 4.1 Profiling Collection
- 4.2 Data Analysis and Output
- 5 Evaluation
- 5.1 Workload and Hardware Description
- 5.2 Results and Discussion
- 6 Conclusions and Future Work
- Acknowledgements
- References
- Posters*
- HPC for Predictive Models in Healthcare
- Abstract
- 1 Motivation
- 2 Research Objectives and Outcomes
- References
- A Methodology for Batching Matrix Kernels in HPC Applications
- 1 Introduction
- 2 Small Matrix Kernels
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.