Architecture, Networks, and Storage.
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters.- NVIDIA's Quantum InfiniBand Network Congestion Control Technology and Its Impact on Application Performance.- LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular Workloads.- SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs.-
Machine Learning, AI, and Emerging Technologies.-
"Hey CAI" - Conversational AI Enabled User Interface for HPC Tools.- Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters.-
HPC Algorithms and Applications.-
Efficient Application of Hanging-Node Constraints for Matrix-Free High-Order FEM Computations on CPU and GPU.- Dynamic Task Fusion for a Block-Structured Finite Volume Solver over a Dynamically Adaptive Mesh with Local Time Stepping.- Accelerating Simulated Quantum Annealing with GPU and Tensor Cores.- m-Cubes: An Efficient and Portable Implementation of Multi-dimensional Integration for GPUs.-
Performance Modeling, Evaluation, and Analysis.-
Comparative Evaluation of Call Graph Generation by Profiling Tools.- MAPredict: Static Analysis Driven Memory Access Prediction Framework for Modern CPUs.- Rapid Execution Time Estimation for Heterogeneous Memory Systems Through Differential Tracing.- Understanding Distributed Deep Learning Performance by Correlating HPC and Machine Learning Measurements.- A Motivating Case Study on Code Variant Selection by Reinforcement Learning.-
Programming Environments and System Software.-
Remote OpenMP Offloading.- Hybrid Parallel ILU Preconditioner in Linear Solver Library GaspiLS.- A Subset of the CERN Virtual Machine File System: Fast Delivering of Complex Software Stacks for Supercomputing Resources.