
High Performance Computing for Computational Science -- VECPAR 2010
Beschreibung
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Weitere Details
Weitere Ausgaben
Andere Ausgaben

Inhalt
- Title Page
- Preface
- Organization
- Table of Contents
- Invited Talks
- Exascale Computing Technology Challenges
- Introduction
- Metrics, Cost Functions, and Constraints
- Memory Subsystem
- Memory Bandwidth
- Memory Capacity
- Latency
- Node Architecture Projections for 2018
- Clock Rate
- Instruction Level Parallelism
- Instruction Bundling (SIMD and VLIW)
- Multithreading to Hide Latency
- FPU Organization
- System on Chip (SoC) Integration
- Alternative Exotic Functional Unit Organizations
- Cache Hierarchy
- Levels of Cache Hierarchy
- Private vs. Shared Caches
- Software Managed Caches vs. Conventional Caches
- Intra-node Communication (Networks-on-Chip)
- Cache Coherence (or Lack Thereof)
- Global Address Space
- Fine Grained Synchronization Support
- Power Management
- Node-Scale Power Management
- System-Scale Power Management
- Energy Aware Algorithms
- Library Integration with Power Management Systems
- Compiler Assisted Power Management
- Application-Directed Power Management
- System "Aging"
- Voltage Conversion and Cooling Efficiency
- Fault Detection and Recovery
- Hard (Permanent) Errors
- Soft (Transient) Errors
- Node Localized Checkpointing
- Interconnection Networks
- Topology
- Effect of Interconnect Topology on Interconnect Design
- Conclusions
- References
- The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem?
- HPC Techniques for a Heart Simulator
- Game Changing Computational Engineering Technology
- HPC in Phase Change: Towards a New Execution Model
- Linear Algebra and Solvers on Emerging Architectures
- Factors Impacting Performance of Multithreaded Sparse Triangular Solve
- Introduction
- Motivation
- Level-Set Triangular Solver
- Related Work
- Factors Affecting Performance
- Numerical Experiments
- Barriers
- Thread Affinity
- Data Locality
- More Realistic Problems
- Summary and Conclusions
- References
- Performance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis Computation
- Introduction
- Orthogonalization Process
- Accelerators Programming
- Nvidia CUDA-Enabled GPUs
- STI Cell Processor
- Optimizations
- BLAS Operations
- CPU
- GPU
- Cell Broadband Engine
- Experimentation
- Hardware Precision
- Performance Achieved
- Synthesis
- Conclusion
- References
- An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations
- Introduction
- Mixed Precision Error Correction Methods
- Mathematical Background
- Mixed Precision Approach
- Hardware Platform and Implementation Issues
- Numerical Experiments
- Test Configurations
- Numerical Results
- Result Interpretation
- Conclusions and Future Work
- References
- Multifrontal Computations on GPUs and Their Multi-core Hosts
- Introduction
- Overview of a Multifrontal Sparse Solver
- Graphics Processing Units
- Algorithm for Factoring Individual Frontal Matrices on the GPU
- Performance of the Accelerated Multifrontal Solver
- Summary
- References
- Accelerating GPU Kernels for Dense Linear Algebra
- Introduction
- Performance of Current BLAS for GPUs
- Pointer Redirecting
- MAGMA BLAS Kernels
- Performance
- Conclusions and On-going Work
- References
- A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
- Introduction
- Cholesky Factorization on Multicore+MultiGPUs
- Principles and Methodology
- Implementations Details
- Memory Optimal
- Data Persistence Optimizations
- Hybrid xPOTRF Kernel
- xSYRK and xTRSM Kernel Optimizations
- Experimental Results
- Environment Setup
- Tuning
- Performance Results
- Related Work
- Summary and Future Work
- References
- On the Performance of an Algebraic Multigrid Solver on Multicore Clusters
- Motivation
- The Algebraic Multigrid (AMG) Solver
- The Hera Multicore Cluster
- Using an MPI-Only Model with AMG
- Replacing On-Node MPI with OpenMP
- The OpenMP Implementation
- Optimizing Memory Behavior with MCSup
- Optimized OpenMP Performance
- Mixed Programming Model
- Investigating the MPI-Only Performance Degradation
- Summary
- References
- An Hybrid Approach for the Parallelization of a Block Iterative Algorithm
- Introduction
- Block Cimmino Algorithm
- Parallelization Strategy
- Manual Parallelism Description
- Automatic Parallelism with MUMPS
- Strategy Details
- Preprocessing
- Solve: The Block-CG Acceleration
- Numerical Results
- Factorization Step
- Solve Step
- Ongoing Work
- References
- Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures
- Introduction
- Tile in-place Matrix Inversion
- Algorithmic Study
- Conclusion and Future Work
- References
- A Massively Parallel Dense Symmetric Eigensolver with Communication Splitting Multicasting Algorithm
- Introduction
- Symmetric Dense Eigensolver
- The Communication Splitting Multicasting Algorithm
- The Data Distribution
- The Square Grid Algorithm for Tridiagonalization
- The Process Grid Free Algorithm for Tridiagonalization
- The Process Grid Free Algorithm for Inverse Transformation
- Performance Evaluation
- Machine Environment
- Performance on Different Process Grids
- Execution Performance in a Massively Parallel Environment
- Conclusion
- References
- Large Scale Simulations in CS&E
- Global Memory Access Modelling for Efficient Implementation of the Lattice Boltzmann Method on Graphics Processing Units
- Compute Unified Device Architecture
- Lattice Boltzmann Method
- Methodology
- Modelling
- Throughput
- N = 20
- 21 = N = 39
- Complementary Studies
- Implementations
- References
- Data Structures and Transformations for Physically Based Simulation on a GPU
- Introduction
- Related Work
- Physically Based Simulation Framework
- Coalesced Memory Accesses from Arrays of Objects
- Automated Framework for Physics Data Structures
- Data Transformations and Hierarchically Designed Data Structures
- Performance Results
- Conclusion
- References
- Scalability Studies of an Implicit Shallow Water Solver for the Rossby-Haurwitz Problem
- Introduction
- A Fully Implicit Finite Volume Discretization
- An Inexact Newton's Method with Adaptive Stopping Conditions
- Some Variants of One-Level and Multilevel Schwarz Preconditioners
- Numerical Experiments
- Numerical Conservation
- Performance Tests
- Conclusions
- References
- Parallel Multigrid Solvers Using OpenMP/MPI Hybrid Programming Models on Multi-Core/Multi-Socket Clusters
- Introduction
- Hardware Environment
- Implementation and Optimization of Target Application
- Finite-Volume Application
- Iterative Method with Multigrid Preconditioning
- Procedures for Reordering
- Procedures for Optimization
- Results
- Effect of Coloring and Optimization
- Weak Scaling
- Strong Scaling
- Concluding Remarks
- References
- A Parallel Strategy for a Level Set Simulation of Droplets Moving in a Liquid Medium
- Introduction
- Numerical Simulation of Droplets Sedimenting in Water
- Parallel Hierarchy of Triangulations
- Parallel Performance
- Concluding Remarks
- References
- Optimization of Aircraft Wake Alleviation Schemes through an Evolution Strategy
- Introduction
- Optimization of Wake Alleviation
- Alleviation Scheme
- Optimization of the Lift Distribution and Perturbation
- Methodology
- Vortex Particle Method
- Evolution Strategy
- Coupling and Computation
- Results
- Optimization
- Optimum Parameter Set
- Conclusions
- References
- Parallel and Distributed Computing
- On-Line Multi-Threaded Processing of Web User-Clicks on Multi-Core Processors
- Introduction
- Background and Problem Setting
- Strategies for Read-Write Synchronization
- Experiments
- Conclusions
- References
- Performance Evaluation of Improved Web Search Algorithms
- Introduction
- Search Architecture
- A Cost Estimation Methodology
- Experimental Setting
- Conclusions
- References
- Text Classification on a Grid Environment
- Introduction
- Text Classification
- Naïve Bayes Classifier
- Expectation-Maximization Algorithm
- Grid Environment
- NACAD Grid Environment
- Grid Services
- Naïve Bayes Classifier via the EM Algorithm on a Grid Environment
- Results
- Performance Criteria
- Conclusion
- References
- On the Vectorization of Engineering Codes Using Multimedia Instructions
- Introduction
- Outline of the Boundary Element Theory
- The Application
- The Streaming SIMD Extensions
- Auto-vectorization Compilers
- Compiler Intrinsics
- An SSE Implementation
- Results Summary
- Conclusions
- References
- Numerical Library Reuse in Parallel and Distributed Platforms
- Introduction
- Linear Algebra Libraries
- Imperative Numerical Libaries
- Object Oriented Numerical Libraries
- A Reusable Numerical Library Design Model
- Library Integration in Scientific Workflow Environment
- Experiments
- Conclusion
- References
- Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas
- Introduction
- Related Work
- Minas
- MAi: Memory Affinity interface
- MApp: Memory Affinity preprocessor
- Numarch: NUMA Architecture Module
- Performance Evaluation
- Cache-Coherent NUMA Platforms
- Numerical Scientific Parallel Applications
- Experimental Results
- Conclusion and Future Work
- References
- HPC Environment Management: New Challenges in the Petaflop Era
- Introduction
- Available Tools
- Deployment Tools
- Monitoring Tools
- Proprietary Solutions
- The LEMMing Project
- LEMMing Web Services (LEMM-WS)
- LEMMing Web Application (LEMM-GATE)
- Conclusion
- References
- Evaluation of Message Passing Communication Patterns in Finite Element Solution of Coupled Problems
- Introduction
- EdgeCFD: The Benchmark Software
- Performance Tests
- Concluding Remarks
- References
- Applying Process Migration on a BSP-Based LU Decomposition Application
- Introduction
- MigBSP: Process Rescheduling Model
- LU Decomposition Application
- BSP-Based LU Application Modeling
- Evaluation Methodology
- Results Analysis
- Related Work
- Concluding Remarks
- References
- A P2P Approach to Many Tasks Computing for Scientific Workflows
- Introduction
- Backgrounds on P2P Networks
- Design of SciMule
- SciMule Architectural Features
- SciMule Conceptual Architecture
- SciMule Evaluation
- Conclusions
- References
- Intelligent Service Trading and Brokering for Distributed Network Services in GridSolve
- Introduction
- ServiceTrading
- Inputs for the Service Trader
- The Trader Output
- Inside the Trader
- Overview of GridSolve: A GridRPC Middleware
- Integration of Service Trading into GridSolve
- Generating the Inputs of the Trader
- Discover the Combination of Services
- Call the Services
- The Service Trader C API
- The Matlab Interface
- Experiments
- Summary
- References
- Load Balancing in Dynamic Networks by Bounded Delays Asynchronous Diffusion
- Introduction
- Model
- Notations
- General Load Balancing Scheme
- Dynamical Evolution of the System State
- Choice of the Load Ratios
- Load Balancing Algorithm
- Proof of the Load Balancing Convergence
- Technical Results
- Proof of Theorem 1
- Experimental Evaluation
- Efficiency Evaluation
- Experimental Contexts
- Results
- Conclusion
- References
- A Computing Resource Discovery Mechanism over a P2P Tree Topology
- Introduction
- CoDiP2P Architecture
- Updating Algorithm
- Departure of Peers
- Searching Algorithms
- Exact Query Searching Algorithm
- Range Query Searching Algorithm
- Rebalancing Mechanism
- Experimentation
- Experimental Results
- Conclusions and Future Work
- References
- Numerical Algorithms
- A Parallel Implementation of the Jacobi-Davidson Eigensolver for Unsymmetric Matrices
- Introduction
- The Jacobi-Davidson Method
- Computation of Eigenvalues at the Periphery of the Spectrum
- Computation of Interior Eigenvalues
- Computing Complex Eigenvalues with Real Arithmetic
- Preconditioning
- Implementation Details
- Computational Results
- The Exterior Case
- The Interior Case
- Parallel Performance
- Conclusions and Future Work
- References
- The Impact of Data Distribution in Accuracy and Performance of Parallel Linear Algebra Subroutines
- Introduction
- Background
- Theory of Rounding Errors
- Data Distribution in Numerical Algorithms
- Numerical Experiments
- Platform
- Input Data
- Numerical Results
- Final Remarks and Future Work
- References
- On a Strategy for Spectral Clustering with Parallel Computation
- Introduction
- Parallel Spectral Clustering: Algorithm and Justification
- Choice of the Affinity Parameter s
- Number of Clusters $k$
- Implementation: Algorithm Components
- Pre-processing Step: Partition $S$ in $q$ Subdomains
- Domain Decomposition: Interface and Subdomains
- Spectral Clustering on Subdomains
- Grouping Step
- Parallel Experiments
- Discussion and Alternative
- Numerical Experiments: Geometrical Example
- An Image Segmentation Example
- Conclusion and Ongoing Works
- References
- On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver
- The Schur Complement Method and Parallelization
- Efficient Computation of an Approximate Schur Complement
- Sparse Triangular Solution with Sparse Right-Hand-Sides
- Intra-processor Load Balance
- Inter-processor Load Balance
- Parallel Performance
- Conclusion
- References
- Solving Dense Interval Linear Systems with Verified Computing on Multicore Architectures
- Introduction and Motivation
- Parallel and Verified Computing
- Mathematical Background
- Proposed Approach
- Initial Implementation
- Initial Approach Evaluation
- Optimized Parallel Approach
- Optimized Approach Evaluation
- Considerations and Future Work
- References
- TRACEMIN-Fiedler: A Parallel Algorithm for Computing the Fiedler Vector
- Introduction
- The TRACEMIN-Fiedler Algorithm
- Parallel Implementation of TRACEMIN-Fiedler
- Numerical Results
- Conclusions
- References
- Applying Parallel Design Techniques to Template Matching with GPUs
- Introduction
- Template Matching Background
- Case Study: Full Search and On-Card Memory
- GPU Acceleration Method
- Results
- Analysis
- Conclusions and Future Work
- References
- Author Index
Systemvoraussetzungen
Dateiformat: PDF
Kopierschutz: Wasserzeichen-DRM (Digital Rights Management)
Systemvoraussetzungen:
- Computer (Windows; MacOS X; Linux): Verwenden Sie zum Lesen die kostenlose Software Adobe Reader, Adobe Digital Editions oder einen anderen PDF-Viewer Ihrer Wahl (siehe E-Book Hilfe).
- Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions oder die App PocketBook (siehe E-Book Hilfe).
- E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m.
Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Wasserzeichen-DRM wird hier ein „weicher” Kopierschutz verwendet. Daher ist technisch zwar alles möglich – sogar eine unzulässige Weitergabe. Aber an sichtbaren und unsichtbaren Stellen wird der Käufer des E-Books als Wasserzeichen hinterlegt, sodass im Falle eines Missbrauchs die Spur zurückverfolgt werden kann.
Weitere Informationen finden Sie in unserer E-Book Hilfe.