High Performance Computing for Computational Science - VECPAR 2018

Name: High Performance Computing for Computational Science - VECPAR 2018 | 13th International Conference, São Pedro, Brazil, September 17-19, 2018, Revised Selected Papers
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

13th International Conference, São Pedro, Brazil, September 17-19, 2018, Revised Selected Papers

Hermes Senger Osni Marques Rogerio Garcia Tatiana Pinheiro de Brito Rogério Iope Silvio Stanzani Veronica Gil-Costa(Editor)

Springer (Publisher)

Published on 25. March 2019

XIII, 264 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-15996-2 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Contents
Regular Papers
Communication-Free Parallel Mesh Multiplication for Large Scale Simulations
1 Introduction
2 Mesh Multiplication Method
2.1 Communication Map
2.2 Assigning Processes to New Communication Nodes
2.3 Boundary Smoothing
3 Test Case
3.1 YF-17 Mesh Multiplication
4 Conclusions
References
Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer Algorithms
1 Introduction
2 Background and Related Works
2.1 N-Queens and ATSP
2.2 CUDA Dynamic Parallelism Programming Model
2.3 GPU-Accelerated Backtracking
3 The Proposed Algorithm
3.1 Memory Requirement Analysis
3.2 Launching the First Kernel Generation
4 Performance Evaluation
4.1 Experimental Protocol and Parameters Settings
4.2 Results
4.3 Discussion
5 Conclusion and Future Works
References
Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources
1 Introduction
2 Basic Concepts
2.1 Computational Fluid Dynamics (CFD)
2.2 Task-Based Parallel Programming Paradigm
3 Related Work and Motivation
4 Design of a Parallel Heterogeneous CFD Application
4.1 Serial CFD Application Design and Implementation
4.2 Task-Based Heterogeneous Parallel Version
5 Performance Analysis
5.1 Experimental Design
5.2 Speedup Analysis: CPU, CPU/GPU, and Ghost Cells
5.3 Block Size Comparison and Application Iteration Overlapping
5.4 Behavior Characterization of the StarPU GPU Version
6 Conclusion
References
Optimizing Packed String Matching on AVX2 Platform
1 Introduction
2 Notions and Basics
3 Algorithms
3.1 EPSMA
3.2 SSEFA
3.3 Cache Optimization
4 Implementation and Experimental Results
5 Conclusions
References
A GPU-Based Metaheuristic for Workflow Scheduling on Clouds
1 Introduction
2 Problem Definition
3 The Sequential Hybrid Evolutionary Algorithm
3.1 Identifying Hotspots
4 Proposed GPU-Based Local Searches
4.1 Move-Element
4.2 2-opt move-element
4.3 Swap-Vm
4.4 4-opt move-element
5 Experimental Results
6 Conclusion Remarks
References
A Systematic Mapping on High-Performance Computing for Protein Structure Prediction
1 Introduction
2 Protein Structure Prediction Problem
3 Research Method
3.1 Research Questions
3.2 Study Selection
3.3 Search String
3.4 Sources
3.5 Inclusion and Exclusion Criteria
3.6 Data Extraction and Synthesis
3.7 Classification Scheme
4 Results
5 Discussion
5.1 Future Research Directions
6 Conclusion
References
Performance Evaluation of Deep Learning Frameworks over Different Architectures
1 Introduction
2 Related Work
3 Deep Learning Frameworks
3.1 Caffe
3.2 TensorFlow
4 Methodology
5 Experimental Results
5.1 Average Iteration Time by Batch Size
5.2 Average per Image Time by Batch Size
5.3 Caffe Loss Calculation Problem
6 Conclusion
References
Non-uniform Domain Decomposition for Heterogeneous Accelerated Processing Units
1 Introduction
2 Related Work
3 Lattice-Boltzmann Method
4 The LBM OpenCL Implementation
5 Experimental Results
5.1 Platform and Environment
5.2 Performance Evaluation
6 Conclusion
References
Performance Evaluation of Two Load Balancing Algorithms for Hybrid Clusters
1 Introduction
2 Related Work
3 The HIS Simulator
4 Static LB Algorithm
5 Performance Evaluation
5.1 Results
5.2 How the Dataset and the CPU Optimizations Impact Performance
6 Conclusion and Future Work
References
An Improved OpenMP Implementation of the TVD-Hopmoc Method Based on a Cluster of Points
1 Introduction
2 The TVD-Hopmoc Method
3 A Basic OpenMP Implementation of the TVD-Hopmoc Method
4 An Improved OpenMP Implementation of the TVD-Hopmoc Method Based on a Cluster of Points
5 Experimental Results
6 Conclusions and Future Directions
References
A Scheduling Theory Framework for GPU Tasks Efficient Execution
1 Introduction
2 Scheduling Theory Applied to GPU Tasks Execution
3 Flow Shop Scheduling
3.1 Slope Index
3.2 NEH Heuristic
3.3 Single Queue
3.4 NEH Heuristic with a GPU Tasks Execution Model
4 Experiments
4.1 Statistical Analysis
4.2 Applicability and Scalability
5 Conclusions
References
A Timer-Augmented Cost Function for Load Balanced DSMC
1 Introduction
2 DSMC
3 Load Balancing
3.1 State of the Art
4 Timer-Augmented Cost Function
5 Method
5.1 Test Case
6 Results
7 Conclusion
References
Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP
1 Introduction
2 OpenMP Offloading Performance
3 PlasCom2
4 The HybridOMP Library
4.1 Work Partitioning
4.2 Data Movement
4.3 Code Execution
4.4 Implementation
5 Experimental Methodology
5.1 Hardware Environment
5.2 Software Environment
5.3 PlasCom2 Configuration
5.4 Execution
6 Results
6.1 Single-Node Experiments
6.2 Multi-node Experiments
7 Related Work
8 Conclusions and Future Work
References
A New Parallel Benchmark for Performance Evaluation and Energy Consumption
1 Introduction
2 Parallel Programming Interfaces
3 Related Work
3.1 Similar Benchmarks
3.2 Comparison Among the Benchmarks
4 Benchmark Applications
4.1 Parallelizing the Applications
4.2 Applications History
5 Results
5.1 Methodology
5.2 Complexity
5.3 Performance and Energy Consumption
6 Conclusions and Future Work
References
Bigger Buffer k-d Trees on Multi-Many-Core Systems
1 Motivation
2 Background
2.1 Massively-Parallel Nearest Neighbor Computations
2.2 Nearest Neighbor Search via k-d Trees
2.3 Revisited: Buffer k-d Trees
3 Processing Bigger Trees
3.1 Construction Phase
3.2 Query Phase
4 Experiments
4.1 Experimental Setup
4.2 Modified Workflow
4.3 Large-Scale Applications
5 Conclusions and Outlook
References
A Parallel Generator of Non-Hermitian Matrices Computed from Given Spectra
1 Introduction
2 Related Work
3 Matrix Generation Algorithm
3.1 Matrix Generation Method
3.2 Algorithm
4 Parallel Implementation
4.1 Basic Implementation on CPUs
4.2 Implementation on Mutil-CPU
4.3 Specific Optimized Communication Implementation on CPUs
5 Performance Evaluation
5.1 Hardware Environment
5.2 Strong and Weak Scalability Results and Analysis
5.3 Speedup Results and Analysis
6 Accuracy Verification
6.1 Verification Method
6.2 Experimental Results
6.3 Arithmetic Precision Analysis
7 Conclusion and Perspectives
References
LRMalloc: A Modern and Competitive Lock-Free Dynamic Memory Allocator
1 Introduction
2 Background and Related Work
3 LRMalloc
3.1 High-Level Overview
3.2 Heap
3.3 Pagemap
4 Experimental Results
5 Conclusions and Further Work
References
Short Paper
Towards a Strategy for Performance Prediction on Heterogeneous Architectures
Abstract
1 Introduction
2 Concepts and Problem Definition
2.1 Concepts
2.2 Problem Definition
3 Related Work
4 A Strategy for Performance Prediction
4.1 Profiling Collection
4.2 Data Analysis and Output
5 Evaluation
5.1 Workload and Hardware Description
5.2 Results and Discussion
6 Conclusions and Future Work
Acknowledgements
References
Posters*
HPC for Predictive Models in Healthcare
Abstract
1 Motivation
2 Research Objectives and Outcomes
References
A Methodology for Batching Matrix Kernels in HPC Applications
1 Introduction
2 Small Matrix Kernels
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

High Performance Computing for Computational Science - VECPAR 2018

Description

More details

Other editions

Additional editions

Content

System requirements