HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026

This program is tentative and subject to change.

Tue 3 Feb 2026 16:10 - 16:30 at Coogee - Distributed and Multi-GPU Training Chair(s): J. Nelson Amaral

Machine Learning (ML) has become a cornerstone in numerous applications, creating the need for secure and efficient distributed ML frameworks. However, maintaining data privacy in these systems poses significant challenges, particularly in distributed environments where user data and model parameters must frequently be transmitted between GPUs. Confidential GPU computing technologies, such as NVIDIA’s Confidential GPU mode, offer hardware-based enterprise solutions designed to protect ML workloads in untrusted environments (e.g., public clouds). These technologies leverage heterogeneous systems that combine Confidential Virtual Machines (CVMs) with GPU-based Trusted Execution Environments (TEEs). Nevertheless, confidential computing introduces considerable performance overhead due to its complex heterogeneous architecture and the high throughput data flows required across TEE security boundaries. For example, encrypted communication occurs both between CVMs and GPU TEEs, and among multiple GPU TEEs, resulting in significant latency compared to native PCIe or high-speed interconnects NVLink. Our extensive evaluation shows that these overheads become particularly severe during collective communication operations, which suffer from encryption-induced delays that negatively impact end-to-end training performance.

To address this, we propose a co-encryption approach that leverages underutilized GPU resources, optimizes encryption and authentication, and introduces a communication algorithm tailored for confidential settings. We evaluate our design using real ML workloads and execution traces collected from four HGX H100/H200 clusters. While CC mode was not available on current NVIDIA software stacks, we incorporate encryption-aware modeling based on hardware specifications, enabling realistic estimation of secure communication overheads. Our results demonstrate a 40–70% reduction in communication related security costs

This program is tentative and subject to change.

Tue 3 Feb

Displayed time zone: Hobart change

15:50 - 17:10
Distributed and Multi-GPU TrainingMain Conference at Coogee
Chair(s): J. Nelson Amaral
15:50
20m
Talk
Compression-Aware Gradient Splitting for Collective Communications in Distributed Training
Main Conference
Pranati Majhi Texas A&M University, Sabuj Laskar Texas A&M University, Abdullah Muzahid Texas A & M University, Eun Jung Kim
16:10
20m
Talk
SCALE: Tackling Communication Bottlenecks in Confidential Multi-GPU ML
Main Conference
Joongun Park Georgia Tech, Yongqin Wang University of Southern California, Huan Xu Georgia Institute of Technology, Hanjiang Wu Georgia Institute of Technology, Mengyuan Li USC, Tushar Krishna Georgia Institute of Technology
16:30
20m
Talk
AutoHAAP: Automated Heterogeneity-Aware Asymmetric Partitioning for LLM Training
Main Conference
Yuanyuan Wang Zhejiang Lab, Nana Tang Zhejiang Lab, Yuyang Wang Zhejiang Lab, Shu Pan Zhejiang Lab, Dingding Yu Zhejiang Lab, Zeyue Wang Zhejiang Lab, Mou Sun Zhejiang Lab, Kejie Fu Zhejiang Lab, Fangyu Wang Zhejiang Lab, Yunchuan Chen Zhejiang Lab, Ning Sun Zhejiang Lab, Fei Yang Zhejiang Lab
16:50
20m
Talk
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
Main Conference
Chen Zhang Shanghai Jiao Tong University, Qijun Zhang Shanghai Jiao Tong University, Zhuoshan Zhou Shanghai Jiao Tong University, Yijia Diao Shanghai Jiao Tong University, Haibo Wang Huawei, Zhe Zhou Huawei, Zhipeng Tu Huawei, Zhiyao Li Huawei, Guangyu Sun Peking University, Zhuoran Song Shanghai Jiao Tong University, Zhigang Ji Shanghai Jiao Tong University, Jingwen Leng Shanghai Jiao Tong University, Minyi Guo Shanghai Jiao Tong University
Hide past events