SCALE: Tackling Communication Bottlenecks in Confidential Multi-GPU ML
This program is tentative and subject to change.
Machine Learning (ML) has become a cornerstone in numerous applications, creating the need for secure and efficient distributed ML frameworks. However, maintaining data privacy in these systems poses significant challenges, particularly in distributed environments where user data and model parameters must frequently be transmitted between GPUs. Confidential GPU computing technologies, such as NVIDIA’s Confidential GPU mode, offer hardware-based enterprise solutions designed to protect ML workloads in untrusted environments (e.g., public clouds). These technologies leverage heterogeneous systems that combine Confidential Virtual Machines (CVMs) with GPU-based Trusted Execution Environments (TEEs). Nevertheless, confidential computing introduces considerable performance overhead due to its complex heterogeneous architecture and the high throughput data flows required across TEE security boundaries. For example, encrypted communication occurs both between CVMs and GPU TEEs, and among multiple GPU TEEs, resulting in significant latency compared to native PCIe or high-speed interconnects NVLink. Our extensive evaluation shows that these overheads become particularly severe during collective communication operations, which suffer from encryption-induced delays that negatively impact end-to-end training performance.
To address this, we propose a co-encryption approach that leverages underutilized GPU resources, optimizes encryption and authentication, and introduces a communication algorithm tailored for confidential settings. We evaluate our design using real ML workloads and execution traces collected from four HGX H100/H200 clusters. While CC mode was not available on current NVIDIA software stacks, we incorporate encryption-aware modeling based on hardware specifications, enabling realistic estimation of secure communication overheads. Our results demonstrate a 40–70% reduction in communication related security costs
This program is tentative and subject to change.
Tue 3 FebDisplayed time zone: Hobart change
15:50 - 17:10 | |||
15:50 20mTalk | Compression-Aware Gradient Splitting for Collective Communications in Distributed Training Main Conference Pranati Majhi Texas A&M University, Sabuj Laskar Texas A&M University, Abdullah Muzahid Texas A & M University, Eun Jung Kim | ||
16:10 20mTalk | SCALE: Tackling Communication Bottlenecks in Confidential Multi-GPU ML Main Conference Joongun Park Georgia Tech, Yongqin Wang University of Southern California, Huan Xu Georgia Institute of Technology, Hanjiang Wu Georgia Institute of Technology, Mengyuan Li USC, Tushar Krishna Georgia Institute of Technology | ||
16:30 20mTalk | AutoHAAP: Automated Heterogeneity-Aware Asymmetric Partitioning for LLM Training Main Conference Yuanyuan Wang Zhejiang Lab, Nana Tang Zhejiang Lab, Yuyang Wang Zhejiang Lab, Shu Pan Zhejiang Lab, Dingding Yu Zhejiang Lab, Zeyue Wang Zhejiang Lab, Mou Sun Zhejiang Lab, Kejie Fu Zhejiang Lab, Fangyu Wang Zhejiang Lab, Yunchuan Chen Zhejiang Lab, Ning Sun Zhejiang Lab, Fei Yang Zhejiang Lab | ||
16:50 20mTalk | Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems Main Conference Chen Zhang Shanghai Jiao Tong University, Qijun Zhang Shanghai Jiao Tong University, Zhuoshan Zhou Shanghai Jiao Tong University, Yijia Diao Shanghai Jiao Tong University, Haibo Wang Huawei, Zhe Zhou Huawei, Zhipeng Tu Huawei, Zhiyao Li Huawei, Guangyu Sun Peking University, Zhuoran Song Shanghai Jiao Tong University, Zhigang Ji Shanghai Jiao Tong University, Jingwen Leng Shanghai Jiao Tong University, Minyi Guo Shanghai Jiao Tong University | ||