Parallel Computing Technologies

Name: Parallel Computing Technologies | 16th International Conference, PaCT 2021, Kaliningrad, Russia, September 13-18, 2021, Proceedings
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th International Conference, PaCT 2021, Kaliningrad, Russia, September 13-18, 2021, Proceedings

Victor Malyshkin(Editor)

Springer (Publisher)

Published on 6. September 2021

XII, 480 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-86359-3 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Contents
Parallel Programming Methods and Tools
Trace-Based Optimization of Fragmented Programs Execution in LuNA System
1 Introduction
2 Trace Playback in LuNA System
2.1 LuNA System
2.2 Trace Playback in LuNA System
3 Discussion and Related Works
3.1 Analysis and Discussion
3.2 Related Works
4 Experiments
5 Conclusion
References
A New Model-Based Approach to Performance Comparison of MPI Collective Algorithms
1 Introduction
2 Related Work
2.1 Analytical Performance Models of MPI Collective Algorithms
2.2 Measurement of Model Parameters
2.3 Selection of Collective Algorithms Using Machine Learning Algorithms
3 Implementation-Derived Analytical Models of Collective Algorithms
3.1 Binomial Tree Broadcast Algorithm
4 Estimation of Model Parameters
4.1 Estimation of (P)
4.2 Estimation of Algorithm Specific and
5 Experimental Results and Analysis
5.1 Experiment Setup
5.2 Experimental Estimation of Model Parameters
5.3 Accuracy of Selection of Optimal Collective Algorithms Using the Constructed Analytical Performance Models
6 Conclusions
References
Deterministic OpenMP and the LBP Parallelizing Manycore Processor
1 Introduction
2 Deterministic OpenMP
3 The PISC ISA
4 The LBP Parallelizing Processor
4.1 The Cores
4.2 The Pipeline
4.3 The Memory
5 A Matrix Multiplication Program Example Experiment
6 Conclusion and Perspectives
References
Additional Parallelization of Existing MPI Programs Using SAPFOR
1 Introduction
2 MPI-Aware Parallelization in SAPFOR
3 Results
4 Related Works
5 Conclusion
References
Sparse System Solution Methods for Complex Problems
1 Introduction
2 S3M Package Structure
3 Mathematical Aspects
3.1 AMG Method
3.2 Smoothers
3.3 Constrained Pressure Residual Algorithm
3.4 Two-Stage Method
3.5 Two-Stage Symmetric Gauss-Seidel Algorithm
4 S3M Properties
5 Numerical Experiments
5.1 Symmetric Matrices
5.2 Schur Complement Method for Mimetic Finite Difference Scheme
5.3 CPR for Two-Phase Oil Recovery Problem
References
Resource-Independent Description of Information Graphs with Associative Operations in Set@l Programming Language
1 Introduction
2 Topological Modification of Graphs with Associative Operations
3 Synthesis of Computing Structure
4 Description of Basic Topologies for Information Graphs with Associative Operations in Set@l Programming Language
5 Development of Resource-Independent Program in Set@l Language
6 Conclusions
References
High-Level Synthesis of Scalable Solutions from C-Programs for Reconfigurable Computer Systems
1 Introduction
2 A Review of Existing High-Level Synthesis Tools
3 Calculation Scaling for Reconfigurable Computer System
3.1 Calculation Scaling Based on Inductive Programs Technology
3.2 Calculation Scaling Based on Performance and Hardware Costs Reduction
4 The Methodology of High-Level Synthesis for Scalable Solutions of C-Programs
4.1 Creation of the Problem Information Graph
4.2 Analysis of the Structure of the Problem Information Graph
4.3 Transformation of the Information Graph into the Scalable Parallel Pipeline Form
4.4 Calculation of the Problem Parallelism Parameters for the Available Hardware Resource
4.5 Data Processing Interval Optimization for the Generated Problem Solution
5 Results of Experimental Research for the Developed Complex
6 Conclusion
References
Precompiler for the ACELAN-COMPOS Package Solvers
1 Precompiler for Accelerating Solvers for SLE with Block-Band Matrix
1.1 Iterative Algorithms for Solving the Target SLEs
1.2 Precompiler for the Solver for SLE
1.3 Numerical Experiments with the Solver for SLEs with Block-Band Matrices
1.4 Parallelization of a Loop with Linear Recurrent Dependence
2 Precompiler for Accelerating the Gauss-Seidel Algorithm for Solving the Dirichlet Problem
2.1 Automatic Tiling for the Gauss-Seidel Algorithm for Solving the Dirichlet Problem
2.2 Linearization of Expressions in the OPS
2.3 Gauss-Seidel Algorithm Accelerating by Expression Linearization
3 On the Problem of High-Efficiency Software Portability
References
Execution of NVRAM Programs with Persistent Stack
1 Introduction
2 Model
2.1 System Model
2.2 Failure Model
2.3 Operation Execution
2.4 Correctness
3 Persistent Stack
3.1 Program Stack Concept
3.2 Issues of Existing Implementations
3.3 Persistent Stack Structure
3.4 Update of the Persistent Stack
4 System Implementation
4.1 Pointers to the Memory in NVRAM
4.2 Handling Return Values
4.3 Architecture of the System
5 Verification
5.1 Serializability
5.2 Running Examples
6 Future Work
7 Conclusion
References
A Study on the Influence of Monitoring System Noise on MPI Collective Operations
1 Introduction
2 Background and Related Work
3 The Proposed Method
3.1 Monitoring System Detector
3.2 Statistical Criteria
3.3 Experimental Setup
3.4 Allocation of the Detector
4 Experimental Results
4.1 Using All Logical Cores
4.2 Using Only Physical Cores
4.3 Setting Core Affinity
4.4 Increased Monitoring Frequency
5 Conclusion
References
High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA System
1 Introduction
2 Particular Execution Algorithms Approach
2.1 Main Idea
2.2 Main Definitions and Class of Algorithms Description
2.3 Compiler
2.4 Run-Time Library and Task Graph Execution
3 Performance Evaluation
4 Conclusion
References
Applications
Efficient Cluster Parallelization Technology for Aerothermodynamics Problems
1 Introduction
2 Mathematical Model
3 General Organization of the Computational Code
4 Shared Memory Parallelization in the OpenMP Environment
5 Cluster Parallelization Approach
5.1 Grid Optimization and Partitioning
5.2 Structure of the Code
6 Performance Results
7 Conclusion
References
Computational Aspects of Solving Grid Equations in Heterogeneous Computing Systems
1 Introduction
2 Grid Equations Solving Method
3 Software Implementation of the Method for Solving Grid Equations
4 Parallel Implementation
5 Conclusion
References
Optimized Hybrid Execution of Dense Matrix-Matrix Multiplication on Clusters of Heterogeneous Multicore and Many-Core Platforms
1 Introduction
2 Related Work
3 System Model
4 The Ordering Problem
5 Closed-Form Solutions
5.1 Case I: LON Does Not Compute
5.2 Case II : LON Participates in the Computation
6 On-Line Estimation of Cost Parameters
7 Experimental Results
8 Conclusion and Future Work
References
Parallelization of Robust Multigrid Technique Using OpenMP Technology
1 Introduction
2 Parallel RMT
3 Algebraic Parallelism of RMT
4 Geometric Parallelism of RMT
5 Parallel Multigrid Iteration
6 Remarks on Parallel Implementation
7 Conclusions
References
Network Reliability Calculation with Use of GPUs
1 Introduction
2 Notations and Definitions
3 ATR Calculation Using GPU
4 Calculating NRDC Using GPU
5 Conclusion
References
Memory-Efficient Data Structures
Implicit Data Layout Optimization for Portable Parallel Programming in C++
1 Introduction
2 Programming for Accelerators: PACXX vs. State of the Art
3 Data Layout Optimization: From Manual to Automatic
4 Implicit Data Layout: Design and Implementation
5 Conclusion
References
On Defragmentation Algorithms for GPU-Native Octree-Based AMR Grids
1 Introduction
2 Discrete Model on a Cartesian Grid with AMR
3 Parallel Defragmentation Algorithms
4 Numerical Results
5 Conclusions
References
Zipped Data Structure for Adaptive Mesh Refinement
1 Introduction
2 Data Structure
2.1 Random Access and Sequential Traversal
2.2 Algorithms for Neighbor Search
2.3 Mesh Adaptation Algorithms
3 Implementation
4 Benchmarks
5 Conclusion
References
Automatic Parallel Tiled Code Generation Based on Dependence Approximation
1 Introduction
2 Background
3 Idea of Dependence Approximation and Code Generation
3.1 Dependence Approximation
3.2 Time Partitions Constrains and Finding Linearly Independent Solutions to Them
3.3 Parallel Tiled Code Generation
4 Formal Algorithm
5 Related Work
6 Experiment
6.1 Comparison of Schedules Generated for Polybench Benchmarks
6.2 Examining Tile Dimensions for Dynamic Programming Codes
7 Conclusions
References
Experimental Studies
Scalability Issues in FFT Computation
1 Introduction
2 Parallel FFT Performance Bottleneck
2.1 Scalability Issues
2.2 Peak Performance Model
2.3 Choosing the Fastest FFT Parallel Algorithm
3 Experimental Results
3.1 Strong and Weak Scalability
3.2 MPI Selection for Further Acceleration
4 Conclusion
References
High Performance Implementation of Boris Particle Pusher on DPC++. A First Look at oneAPI
1 Introduction
2 Method
3 Data Structures and Algorithm
4 Exploiting Parallelism Using the OneAPI Technology
4.1 Reference Implementation of the Boris Pusher
4.2 Porting the Pusher to DPC++
4.3 Improving Scaling Efficiency
5 Numerical Results
5.1 Computational Infrastructure
5.2 Benchmarks
5.3 Results and Discussion
6 Conclusion
References
Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications
1 Introduction
2 Target Architectures
3 Related Work
4 Developed Benchmarking System
5 Implemented Benchmarks
5.1 Triada Variations
5.2 Matrix Transpose
5.3 One Dimensional Stencil
5.4 Two- and Three- Dimensional Stencil
5.5 LCopt
5.6 HPL (solving a SLE)
5.7 Random Number Generation
5.8 N-Body
5.9 Random Memory Access
5.10 Bellman-Ford and Page Rank Graph Algorithms
5.11 OpenMP-Based Benchmarks
5.12 Scaling of Multi-threaded Benchmarks
6 Analysis of Hardware Features
7 Conclusions
References
Job Management
Optimization of Resources Allocation in High Performance Distributed Computing with Utilization Uncertainty
1 Introduction and Related Works
2 Resource Selection Algorithm
2.1 Resources Utilization Model
2.2 Resources Allocation Under Uncertainties
2.3 Near-Optimal Resources Allocation
2.4 Greedy Resources Allocation Algorithm
2.5 Time Scan Optimization
3 Simulation Study
3.1 Simulation Environment
3.2 Dynamic Resources Allocation
3.3 Time Scan Optimization
4 Conclusion and Future Work
References
Influence of Execution Time Forecast Accuracy on the Efficiency of Scheduling Jobs in a Distributed Network of Supercomputers
1 Introduction
2 Experiments
2.1 GDN Testbed
2.2 The Approach for Preparing Test Set
2.3 Assumptions and Methodology of the Simulation
2.4 Simulation Results
3 Conclusion
References
Performance Estimation of a BOINC-Based Desktop Grid for Large-Scale Molecular Docking
1 Introduction
2 Related Works
3 The Project SiDock@home
4 Performance Analysis of a Desktop Grid
4.1 Alternative Performance Metrics of a Desktop Grid
5 Conclusion
References
Essential Algorithms
Consensus-Free Ledgers When Operations of Distinct Processes are Commutative
1 Introduction
1.1 Context of the Study
1.2 Computing Model
1.3 When the Operations append() of Different Processes Commute
2 Underlying Formalization
2.1 A Quick Look at Mazurkiewicz's Traces
2.2 Problem Formalization
2.3 From a Specification to Executions
3 An Algorithm Implementing a PC-Ledger
3.1 Reliable Broadcast
3.2 Local Data Structures
4 Proof of the Algorithm
5 Conclusion
A Exercise: From a PC-Ledger to a Distributed PC-State Machine
References
Design and Implementation of Highly Scalable Quantifiable Data Structures
1 Introduction
2 Proving that a Data Structure is Quantifiably Correct
3 Design of a Quantifiable Stack
4 Design of a Quantifiable Queue
5 Quantifiability Applied to Other Types of Data Structures
6 Performance
7 Related Work
8 Conclusion
References
Optimal Concurrency for List-Based Sets
1 Introduction
2 Concurrency Analysis of List-Based Sets
2.1 Preliminaries
2.2 Concurrency as Admissible Schedules of Sequential Code
2.3 Concurrency Analysis of the Lazy and Harris-Michael Linked Lists
3 The VBL List
3.1 Value-Aware Try-Lock
3.2 VBL List
4 Experimental Evaluation
5 Related Work and Concluding Remarks
References
Mobile Agents Operating on a Grid: Some Non-conventional Issues in Parallel Computing
1 Rules of the Game
2 Several Cops Capture Several Robbers
3 Chasing a Continuous Stream of Invaders
3.1 Herd Immunity from Invaders
3.2 Controlling the Size of an Everlasting Invasion
4 Data Collection in a Sparse Sensor Network
5 Some Considerations and Possible Extensions
References
Computing Services
Parallel Computations in Integrated Environment of Engineering Modeling and Global Optimization
1 Introduction
2 OpenFOAM+Globalizer Integration
3 Optimizing the Beam Profile of Complex Geometry
4 Conclusion
References
Implementing Autonomic Internet of Things Ecosystems - Practical Considerations
1 Introduction
2 Autonomic Computing for the Real-World IoT
2.1 Port Automation Pilot
2.2 Smart Safety of Workers Pilot
2.3 Cohesive Vehicle Monitoring and Diagnostics Pilot
3 State-of-the-art in tools for Autonomic Computing
4 Needs vs Available Tools - Critical Analysis
5 Concluding Remarks
References
Information-Analytical System to Support the Solution of Compute-Intensive Problems of Mathematical Physics on Supercomputers
1 Introduction
2 Conceptual Scheme of Intelligent Support for Solving Compute-Intensive Problems
3 The Knowledge Base
4 Implementation of Information-Analytical Internet Resource
5 Related Works
6 Conclusion
References
The Web Platform for Storing Biotechnologically Significant Properties of Bacterial Strains
1 Introduction
2 Web-Platform
2.1 Architecture
2.2 The Data Store
2.3 The Metadata Server
2.4 The Web-Service Core
2.5 The Graphical User Interface
3 Conclusion
References
Cellular Automata
Minimal Covering of the Space by Domino Tiles
1 Introduction
2 Problem Statement
3 Design of the CA Rules
3.1 Templates
3.2 Hit Value
3.3 Processing Scheme
3.4 The First Rule
3.5 The Second Rule: Minimizing the Number of Dominoes
3.6 Performance for Other Field Sizes
4 Conclusion
References
Application of the Generalized Extremal Optimization and Sandpile Model in Search for the Airborne Contaminant Source
1 Introduction
2 Sandpile Model
3 GEO Algorithm
4 Two-Dimensional Cellular Automata for Localization Model
5 The GEO-Sandpile Localization Model
6 Verification of the GEO-Sandpile Model Effectiveness in the Localization of the Contaminant Source
6.1 Generation of Testing Data
6.2 Test Cases Assumptions
7 Results of the GEO-Sandpile Localization Model
7.1 Results for Various Wind Speed Test Cases
7.2 Results for Various Wind Directions and Target Source Positions
8 Conclusions an Future Works
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Parallel Computing Technologies

Description

More details

Other editions

Additional editions

Content

System requirements