
High Performance Computing for Computational Science -- VECPAR 2010
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title Page
- Preface
- Organization
- Table of Contents
- Invited Talks
- Exascale Computing Technology Challenges
- Introduction
- Metrics, Cost Functions, and Constraints
- Memory Subsystem
- Memory Bandwidth
- Memory Capacity
- Latency
- Node Architecture Projections for 2018
- Clock Rate
- Instruction Level Parallelism
- Instruction Bundling (SIMD and VLIW)
- Multithreading to Hide Latency
- FPU Organization
- System on Chip (SoC) Integration
- Alternative Exotic Functional Unit Organizations
- Cache Hierarchy
- Levels of Cache Hierarchy
- Private vs. Shared Caches
- Software Managed Caches vs. Conventional Caches
- Intra-node Communication (Networks-on-Chip)
- Cache Coherence (or Lack Thereof)
- Global Address Space
- Fine Grained Synchronization Support
- Power Management
- Node-Scale Power Management
- System-Scale Power Management
- Energy Aware Algorithms
- Library Integration with Power Management Systems
- Compiler Assisted Power Management
- Application-Directed Power Management
- System "Aging"
- Voltage Conversion and Cooling Efficiency
- Fault Detection and Recovery
- Hard (Permanent) Errors
- Soft (Transient) Errors
- Node Localized Checkpointing
- Interconnection Networks
- Topology
- Effect of Interconnect Topology on Interconnect Design
- Conclusions
- References
- The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem?
- HPC Techniques for a Heart Simulator
- Game Changing Computational Engineering Technology
- HPC in Phase Change: Towards a New Execution Model
- Linear Algebra and Solvers on Emerging Architectures
- Factors Impacting Performance of Multithreaded Sparse Triangular Solve
- Introduction
- Motivation
- Level-Set Triangular Solver
- Related Work
- Factors Affecting Performance
- Numerical Experiments
- Barriers
- Thread Affinity
- Data Locality
- More Realistic Problems
- Summary and Conclusions
- References
- Performance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis Computation
- Introduction
- Orthogonalization Process
- Accelerators Programming
- Nvidia CUDA-Enabled GPUs
- STI Cell Processor
- Optimizations
- BLAS Operations
- CPU
- GPU
- Cell Broadband Engine
- Experimentation
- Hardware Precision
- Performance Achieved
- Synthesis
- Conclusion
- References
- An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations
- Introduction
- Mixed Precision Error Correction Methods
- Mathematical Background
- Mixed Precision Approach
- Hardware Platform and Implementation Issues
- Numerical Experiments
- Test Configurations
- Numerical Results
- Result Interpretation
- Conclusions and Future Work
- References
- Multifrontal Computations on GPUs and Their Multi-core Hosts
- Introduction
- Overview of a Multifrontal Sparse Solver
- Graphics Processing Units
- Algorithm for Factoring Individual Frontal Matrices on the GPU
- Performance of the Accelerated Multifrontal Solver
- Summary
- References
- Accelerating GPU Kernels for Dense Linear Algebra
- Introduction
- Performance of Current BLAS for GPUs
- Pointer Redirecting
- MAGMA BLAS Kernels
- Performance
- Conclusions and On-going Work
- References
- A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
- Introduction
- Cholesky Factorization on Multicore+MultiGPUs
- Principles and Methodology
- Implementations Details
- Memory Optimal
- Data Persistence Optimizations
- Hybrid xPOTRF Kernel
- xSYRK and xTRSM Kernel Optimizations
- Experimental Results
- Environment Setup
- Tuning
- Performance Results
- Related Work
- Summary and Future Work
- References
- On the Performance of an Algebraic Multigrid Solver on Multicore Clusters
- Motivation
- The Algebraic Multigrid (AMG) Solver
- The Hera Multicore Cluster
- Using an MPI-Only Model with AMG
- Replacing On-Node MPI with OpenMP
- The OpenMP Implementation
- Optimizing Memory Behavior with MCSup
- Optimized OpenMP Performance
- Mixed Programming Model
- Investigating the MPI-Only Performance Degradation
- Summary
- References
- An Hybrid Approach for the Parallelization of a Block Iterative Algorithm
- Introduction
- Block Cimmino Algorithm
- Parallelization Strategy
- Manual Parallelism Description
- Automatic Parallelism with MUMPS
- Strategy Details
- Preprocessing
- Solve: The Block-CG Acceleration
- Numerical Results
- Factorization Step
- Solve Step
- Ongoing Work
- References
- Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures
- Introduction
- Tile in-place Matrix Inversion
- Algorithmic Study
- Conclusion and Future Work
- References
- A Massively Parallel Dense Symmetric Eigensolver with Communication Splitting Multicasting Algorithm
- Introduction
- Symmetric Dense Eigensolver
- The Communication Splitting Multicasting Algorithm
- The Data Distribution
- The Square Grid Algorithm for Tridiagonalization
- The Process Grid Free Algorithm for Tridiagonalization
- The Process Grid Free Algorithm for Inverse Transformation
- Performance Evaluation
- Machine Environment
- Performance on Different Process Grids
- Execution Performance in a Massively Parallel Environment
- Conclusion
- References
- Large Scale Simulations in CS&E
- Global Memory Access Modelling for Efficient Implementation of the Lattice Boltzmann Method on Graphics Processing Units
- Compute Unified Device Architecture
- Lattice Boltzmann Method
- Methodology
- Modelling
- Throughput
- N = 20
- 21 = N = 39
- Complementary Studies
- Implementations
- References
- Data Structures and Transformations for Physically Based Simulation on a GPU
- Introduction
- Related Work
- Physically Based Simulation Framework
- Coalesced Memory Accesses from Arrays of Objects
- Automated Framework for Physics Data Structures
- Data Transformations and Hierarchically Designed Data Structures
- Performance Results
- Conclusion
- References
- Scalability Studies of an Implicit Shallow Water Solver for the Rossby-Haurwitz Problem
- Introduction
- A Fully Implicit Finite Volume Discretization
- An Inexact Newton's Method with Adaptive Stopping Conditions
- Some Variants of One-Level and Multilevel Schwarz Preconditioners
- Numerical Experiments
- Numerical Conservation
- Performance Tests
- Conclusions
- References
- Parallel Multigrid Solvers Using OpenMP/MPI Hybrid Programming Models on Multi-Core/Multi-Socket Clusters
- Introduction
- Hardware Environment
- Implementation and Optimization of Target Application
- Finite-Volume Application
- Iterative Method with Multigrid Preconditioning
- Procedures for Reordering
- Procedures for Optimization
- Results
- Effect of Coloring and Optimization
- Weak Scaling
- Strong Scaling
- Concluding Remarks
- References
- A Parallel Strategy for a Level Set Simulation of Droplets Moving in a Liquid Medium
- Introduction
- Numerical Simulation of Droplets Sedimenting in Water
- Parallel Hierarchy of Triangulations
- Parallel Performance
- Concluding Remarks
- References
- Optimization of Aircraft Wake Alleviation Schemes through an Evolution Strategy
- Introduction
- Optimization of Wake Alleviation
- Alleviation Scheme
- Optimization of the Lift Distribution and Perturbation
- Methodology
- Vortex Particle Method
- Evolution Strategy
- Coupling and Computation
- Results
- Optimization
- Optimum Parameter Set
- Conclusions
- References
- Parallel and Distributed Computing
- On-Line Multi-Threaded Processing of Web User-Clicks on Multi-Core Processors
- Introduction
- Background and Problem Setting
- Strategies for Read-Write Synchronization
- Experiments
- Conclusions
- References
- Performance Evaluation of Improved Web Search Algorithms
- Introduction
- Search Architecture
- A Cost Estimation Methodology
- Experimental Setting
- Conclusions
- References
- Text Classification on a Grid Environment
- Introduction
- Text Classification
- Naïve Bayes Classifier
- Expectation-Maximization Algorithm
- Grid Environment
- NACAD Grid Environment
- Grid Services
- Naïve Bayes Classifier via the EM Algorithm on a Grid Environment
- Results
- Performance Criteria
- Conclusion
- References
- On the Vectorization of Engineering Codes Using Multimedia Instructions
- Introduction
- Outline of the Boundary Element Theory
- The Application
- The Streaming SIMD Extensions
- Auto-vectorization Compilers
- Compiler Intrinsics
- An SSE Implementation
- Results Summary
- Conclusions
- References
- Numerical Library Reuse in Parallel and Distributed Platforms
- Introduction
- Linear Algebra Libraries
- Imperative Numerical Libaries
- Object Oriented Numerical Libraries
- A Reusable Numerical Library Design Model
- Library Integration in Scientific Workflow Environment
- Experiments
- Conclusion
- References
- Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas
- Introduction
- Related Work
- Minas
- MAi: Memory Affinity interface
- MApp: Memory Affinity preprocessor
- Numarch: NUMA Architecture Module
- Performance Evaluation
- Cache-Coherent NUMA Platforms
- Numerical Scientific Parallel Applications
- Experimental Results
- Conclusion and Future Work
- References
- HPC Environment Management: New Challenges in the Petaflop Era
- Introduction
- Available Tools
- Deployment Tools
- Monitoring Tools
- Proprietary Solutions
- The LEMMing Project
- LEMMing Web Services (LEMM-WS)
- LEMMing Web Application (LEMM-GATE)
- Conclusion
- References
- Evaluation of Message Passing Communication Patterns in Finite Element Solution of Coupled Problems
- Introduction
- EdgeCFD: The Benchmark Software
- Performance Tests
- Concluding Remarks
- References
- Applying Process Migration on a BSP-Based LU Decomposition Application
- Introduction
- MigBSP: Process Rescheduling Model
- LU Decomposition Application
- BSP-Based LU Application Modeling
- Evaluation Methodology
- Results Analysis
- Related Work
- Concluding Remarks
- References
- A P2P Approach to Many Tasks Computing for Scientific Workflows
- Introduction
- Backgrounds on P2P Networks
- Design of SciMule
- SciMule Architectural Features
- SciMule Conceptual Architecture
- SciMule Evaluation
- Conclusions
- References
- Intelligent Service Trading and Brokering for Distributed Network Services in GridSolve
- Introduction
- ServiceTrading
- Inputs for the Service Trader
- The Trader Output
- Inside the Trader
- Overview of GridSolve: A GridRPC Middleware
- Integration of Service Trading into GridSolve
- Generating the Inputs of the Trader
- Discover the Combination of Services
- Call the Services
- The Service Trader C API
- The Matlab Interface
- Experiments
- Summary
- References
- Load Balancing in Dynamic Networks by Bounded Delays Asynchronous Diffusion
- Introduction
- Model
- Notations
- General Load Balancing Scheme
- Dynamical Evolution of the System State
- Choice of the Load Ratios
- Load Balancing Algorithm
- Proof of the Load Balancing Convergence
- Technical Results
- Proof of Theorem 1
- Experimental Evaluation
- Efficiency Evaluation
- Experimental Contexts
- Results
- Conclusion
- References
- A Computing Resource Discovery Mechanism over a P2P Tree Topology
- Introduction
- CoDiP2P Architecture
- Updating Algorithm
- Departure of Peers
- Searching Algorithms
- Exact Query Searching Algorithm
- Range Query Searching Algorithm
- Rebalancing Mechanism
- Experimentation
- Experimental Results
- Conclusions and Future Work
- References
- Numerical Algorithms
- A Parallel Implementation of the Jacobi-Davidson Eigensolver for Unsymmetric Matrices
- Introduction
- The Jacobi-Davidson Method
- Computation of Eigenvalues at the Periphery of the Spectrum
- Computation of Interior Eigenvalues
- Computing Complex Eigenvalues with Real Arithmetic
- Preconditioning
- Implementation Details
- Computational Results
- The Exterior Case
- The Interior Case
- Parallel Performance
- Conclusions and Future Work
- References
- The Impact of Data Distribution in Accuracy and Performance of Parallel Linear Algebra Subroutines
- Introduction
- Background
- Theory of Rounding Errors
- Data Distribution in Numerical Algorithms
- Numerical Experiments
- Platform
- Input Data
- Numerical Results
- Final Remarks and Future Work
- References
- On a Strategy for Spectral Clustering with Parallel Computation
- Introduction
- Parallel Spectral Clustering: Algorithm and Justification
- Choice of the Affinity Parameter s
- Number of Clusters $k$
- Implementation: Algorithm Components
- Pre-processing Step: Partition $S$ in $q$ Subdomains
- Domain Decomposition: Interface and Subdomains
- Spectral Clustering on Subdomains
- Grouping Step
- Parallel Experiments
- Discussion and Alternative
- Numerical Experiments: Geometrical Example
- An Image Segmentation Example
- Conclusion and Ongoing Works
- References
- On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver
- The Schur Complement Method and Parallelization
- Efficient Computation of an Approximate Schur Complement
- Sparse Triangular Solution with Sparse Right-Hand-Sides
- Intra-processor Load Balance
- Inter-processor Load Balance
- Parallel Performance
- Conclusion
- References
- Solving Dense Interval Linear Systems with Verified Computing on Multicore Architectures
- Introduction and Motivation
- Parallel and Verified Computing
- Mathematical Background
- Proposed Approach
- Initial Implementation
- Initial Approach Evaluation
- Optimized Parallel Approach
- Optimized Approach Evaluation
- Considerations and Future Work
- References
- TRACEMIN-Fiedler: A Parallel Algorithm for Computing the Fiedler Vector
- Introduction
- The TRACEMIN-Fiedler Algorithm
- Parallel Implementation of TRACEMIN-Fiedler
- Numerical Results
- Conclusions
- References
- Applying Parallel Design Techniques to Template Matching with GPUs
- Introduction
- Template Matching Background
- Case Study: Full Search and On-Card Memory
- GPU Acceleration Method
- Results
- Analysis
- Conclusions and Future Work
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.