Computer Architecture

Name: Computer Architecture | ISCA 2010 International Workshops A4MMC, AMAS-BT, EAMA, WEED, WIOSCA, Saint-Malo, France, June 19-23, 2010, Revised Selected Papers
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

ISCA 2010 International Workshops A4MMC, AMAS-BT, EAMA, WEED, WIOSCA, Saint-Malo, France, June 19-23, 2010, Revised Selected Papers

Ana Lucia Varbanescu Anca Molnos Rob van Nieuwpoort(Editor)

Springer (Publisher)

Published on 15. February 2012

XXVII, 378 pages

E-Book

PDF with digital watermarking

System requirements

978-3-642-24322-6 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Title Page
Preface
ISCA Workshops Committees
A4MMC Foreword
EAMA Foreword
AMAS-BT Foreword
WEED Foreword
WIOSCA Foreword
Table of Contents
A4MMC: Applications for Multi- and Many-Cores
Accelerating Agent-Based Ecosystem Models Using the Cell Broadband Engine
Introduction
Background
Hardware
Implementation
Performance Evaluation
More on Particle Management
Future Work
Conclusion
References
Performance Impact of Task Mapping on the Cell BE Multicore Processor
Introduction
Cell BE and Benchmark Application
Cell Broadband Engine
Synthetic Benchmark Application
Experiments
Conclusions and Future Work
References
Parallelization Strategy for CELL TV
Introduction
Applications
Parallelization Strategies
Inter Module Parallelization
Inner Module Parallelization and SIMD
Discussion
Conclusion
References
Towards User Transparent Parallel Multimedia Computing on GPU-Clusters
Introduction
Parallel-Horus
GPU-Based Extensions to Parallel-Horus
A Line Detection Application
Curvilinear Structure Detection
Evaluation
Future Work
Conclusions
References
Implementing a GPU Programming Model on a Non-GPU Accelerator Architecture
Introduction and Background
CUDA
MCUDA
Rigel
RCUDA
Source Code Transformations
Runtime Library
Kernel Execution
RCUDA Optimizations
Kernel Code Transformations
Runtime Optimizations
Evaluation
Simulation Infrastructure Methodology
Benchmarks
Baseline Performance
RCUDA Runtime Overhead
Optimizations
DMM Case Study of Performance Portability
Related Work
Conclusion
References
On the Use of Small 2D Convolutions on GPUs
Introduction
Electromagnetic Diffraction at Nano-structures
NVIDIA CUDA GPU Platform
Parallel Implementation on the GPU
Initial CUDA Implementation
Increasing Independent Work
Tuning the Execution Configuration
Optimizing the 2D Convolution
Optimizing Transfers between CPU and GPU
Experiments and Results
Experimental Setup
Performance Measurements
Discussion
Conclusion
References
Can Manycores Support the Memory Requirements of Scientific Applications?
Introduction
Analysis Methodology
Application Analysis
Memory Bandwidth
Memory Footprint
Related Work
Conclusions
References
Parallelizing an Index Generator for Desktop Search
Introduction
What to Parallelize and How?
Filename Generation
Term Extraction
Index Update
Parallelization
Performance Results
Lessons Learned and Conclusion
References
AMAS-BT: 3rd Workshop on Architectural and Micro-Architectural Support for Binary Translation
Computation vs. Memory Systems: Pinning Down Accelerator Bottlenecks
Introduction
Our Vision for Accelerators
Applying Our Methodology to a Trivial Application: Image Rotation
Checking for Any Dependence on Problem Size
Locating Hotspots of Computation and Communication
Hitting the Memory Wall
Distinguishing Local from Global Communication
Understanding Detailed Dataflow Behavior
Our Pintool
Exploring JPEG Acceleration
Related Work
Conclusions
References
Trace Execution Automata in Dynamic Binary Translation
Introduction
Motivation
From Traces to TEA
Building TEA Out of Traces
Recording TEA Instead of Traces
Experimental Results
Implementation Challenges
Analyzing TEA's Performance
Previous and Related Work
Conclusions and Future Work
References
ISAMAP: Instruction Mapping Driven by Dynamic Binary Translation
Introduction
Related Work
ISAMAP
Models
System Overview
Translator Generation
Translator
Endianness
Run-Time
System Calls Mapping
Mapping Improvements
Conditional Mapping
Run-Time Optimizations
Experimental Results
Evaluation
Conclusion
Future Works
References
EAMA: 3rd Workshop for Emerging Applications and Many-Core Architectures
Parallelization of Particle Filter Algorithms
Introduction
Particle Filter Algorithm
MATLAB Implementation
Conversion from MATLAB to C
OpenMP Implementation
Naïve CUDA Implementation
Naïve versus Thrust
CUDA Optimizations
Tree Reductions
GPU Linear Congruential Generator
Results
Integration with MATLAB
Related Work
Conclusions
Recommendations for Further Work
References
What Kinds of Applications Can Benefit from Transactional Memory?
Introduction
Types of TM and How They Can Be Useful
TM Myths and Misconceptions
Evaluating TM Prototypes
Concluding Remarks
References
Characteristics of Workloads Using the Pipeline Programming Model
Introduction
Pipeline Programming Model
Motivation for Pipelining
Uses of the Pipeline Model
Implementations
Methodology
Workloads
Program Characteristics
Experimental Setup
Principal Component Analysis
Experimental Results
Related Work
Conclusions
References
WEED: 2nd Workshop on Energy Efficient Design
The Search for Energy-Efficient Building Blocks for the Data Center
Introduction
Related Work
System Overview
Hardware
Benchmark Details
Measurement Infrastructure
Evaluation
Single-Machine Benchmarks
Multi-machine Dryad Benchmarks
Discussion
Energy Efficiency
The Missing Links
Conclusions
References
KnightShift: Shifting the I/O Burden in Datacenters to Management Processor for Energy Efficiency
Introduction
Overview of Intelligent Platform Management Interface
Enhancing IPMI to Act as Knight
Experimental Setup and Results
Trace Overview
Energy Proportionality Impact on Energy Consumption
Discussion of Energy Proportionality in Future
Related Work
Conclusions
References
Guarded Power Gating in a Multi-core Setting
Introduction
Problem Background
Power-Gating Scenarios
Modeling Methodology
Results
Inter-core Power Gating Results
Intra-core Power Gating Results
Hybrid Power Gating
Discussion
A Case for Guard Mechanism
Conclusions and Future Work
References
Using Partial Tag Comparison in Low-Power Snoop-Based Chip Multiprocessors
Introduction
Background
Motivation
S-PTC: Cache Optimizations
S-PTC
S-PTC Updating
Methodology and Results
Methodology
Performance
Bandwidth Utilization Reduction
Tag Lookup Power
Area
Related Work
Conclusion
References
Achieving Power-Efficiency in Clusters without Distributed File System Complexity
Introduction
Motivation
Problems
Exploiting System-Level Power States
Design
Power-Efficient Nodes
Implementation
Experimental Evaluation
I/O Performance
Cluster Power-Efficiency
Related Work
Conclusions and Future Work
References
What Computer Architects Need to Know about Memory Throttling
Memory Throttling
Overview
Comparison to CPU Clock Throttling
Power and Performance
Infrastructure
System
Workloads
Measurements
Throttling Characterization
Bandwidth
Bandwidth-Limited
Transition
Bandwidth-Saturated
Performance
Power
Conclusion
References
Predictive Power Management for Multi-core Processors
Introduction
Prediction for Power Management
Program Phase Characterization
Methodology
Power Measurement
Performance Counter Measurement
Core Activity Predictor
Core-Level CPU Power Model
Results
Performance Metric Definitions
Quantitative Comparison
Predictive Frequency Boosting
Conclusion
References
WIOSCA: 6th Annual Workshop on the Interaction between Operating Systems and Computer Architecture
IOMMU: Strategies for Mitigating the IOTLB Bottleneck
Introduction
IOMMU Performance Analysis
Virtual I/O Memory Access Patterns
vIOMMU
Analysis of Virtual I/O Memory Access Patterns
IOTLB Miss-Rate Reduction Approaches
Streams Entries Eager Eviction
Non-overlapping Coherent Frames
Large TLB and Higher TLB Associativity
Super-Pages
Prefetching Techniques
Adjacent Mappings Prefetch
Explicit Caching of Mapped Entries
Evaluation of Strategies
Discussion
Related Work
Conclusions
References
Improving Server Performance on Multi-cores via Selective Off-Loading of OS Functionality
Introduction
Background and Motivation
Hardware-Based Decision-Making
Hardware Prediction of OS Syscall Length
Dynamic Estimation of N
Experimental Methodology
Results
Impact of Design Parameters
Comparing Instrumentation and Hardware Prediction
Scalability of Off-Loading
TLB Impact
Related Work
Impact of OS on System Throughput
Hardware Support for Efficient OS Execution
Conclusions
References
Performance Characteristics of Explicit Superpage Support
Introduction
Limitations of Transparent Support
POWER®
Itanium® (IA-64)
X86 Variants (Intel® EM64T, AMD® X86-64)
Related Work
Page Allocation
Page Reclaim
Superpage Reservation
Shared Mapping Accounting
Private Mapping Accounting
Explicit Superpage Support
RAM-Based Filesystem
System V Shared Memory
Anonymous mmap() Mappings
Explicit Programming API
Backing Memory Sections with Superpages
Heap
Mapping Text/Data/BSS
Stack
Evaluation
STREAM (Memory Throughput)
SysBench (Database Workload)
SPECcpu 2006 v1.1 (Computational)
SPECjvm 2008 (Java)
Conclusions
References
Interfacing Operating Systems and Polymorphic Computing Platforms Based on the MOLEN Programming Paradigm
Introduction
Related Work
Background Overview
MOLEN Programming Paradigm
The Runtime Environment
MOLEN Runtime Primitives
MOLEN SET
MOLEN EXECUTE
Dynamic Binding Implementation
Evaluation
Conclusion
References
Extrinsic and Intrinsic Text Cloning
Introduction
Text Cloning: Causes, Implications, Remedies
Extrinsic Text Cloning
Intrinsic Text Cloning
How Important Is ETC and ITC
How to Eliminate ETC and ITC
Grid Computing Systems
Grid Architecture
Extrinsic Text Cloning in Grid
Evaluation Using Simulation
Experimental Framework
Results
Related Work
Conclusions
References
A Case for Coordinated Resource Management in Heterogeneous Multicore Platforms
Introduction
Implementation
The IXP Island of Cores
The x86 Island of Cores
x86-IXP Coordination
Evaluation
RUBiS
MPlayer Benchmark
Discussion of Results - A Case for Coordination
Related Work
Conclusions and Future Work
References
Topology-Aware Quality-of-Service Support in Highly Integrated Chip Multiprocessors
Introduction
Topology-Aware Quality-of-Service
Preliminaries
Topology-Aware Architecture
Shared Region Organization
QOS Support
Topologies
Experimental Methodology
Evaluation Results
Area
Performance
QOS and Preemption Impact
Energy Efficiency
Related Work
Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Architecture

Description

More details

Other editions

Additional editions

Content

System requirements