HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs
Multi-GPU systems deliver high memory capacity and computing power but suffer from slow inter-GPU communication. Wafer-scale GPUs provide a promising solution to the scalability challenge by connecting numerous GPUs with a high-bandwidth and low-latency interposer-based network. While prior work has prototyped wafer-scale GPUs to demonstrate technical feasibility, limited research focuses on architectural designs required to harness the power of such massive devices. With the large number of chiplets and the large-scale interconnect, new bottlenecks emerge that differ from those previously identified in traditional multi-GPU systems. Specifically, the virtual-to-physical address translation process has become a critical constraint in wafer-scale GPU systems due to 1) massive concurrent translation requests and 2) long multi-hop latency in the network. To address these issues, we propose HDPAT, a solution to alleviate translation pressure and enable GPU chiplets to locate the required physical addresses efficiently. HDPAT leverages the GMMUs in other chiplets to improve the concurrency of translation. Moreover, as the translation requests are eventually resolved in the center chiplet (CPU), we convert the close-to-center chiplets to double as translation caches to surrounding chiplets. Additionally, to reduce compulsory misses, HDPAT develops page table entry prefetching. Experimental results on 13 representative applications indicate that HDPAT improves overall performance by an average of 57.99% with negligible area overhead 4.22% of GPU L2 Cache for cuckoo filter and 0.26% of GPU L2 Cache for Next-Hop Computation Unit.
Tue 3 FebDisplayed time zone: Hobart change
14:10 - 15:30 | Memory Systems for Scalable ComputingMain Conference at Collaroy Chair(s): Alexandros Daglis Georgia Tech | ||
14:10 20mTalk | BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism Main Conference | ||
14:30 20mTalk | RoMe: Row Granularity Access Memory System for Large Language Models Main Conference Hwayong Nam Seoul National University, Seungmin Baek Seoul National University, Jumin Kim Seoul National University, Michael Jaemin Kim Meta, Jung Ho Ahn Seoul National University Pre-print | ||
14:50 20mTalk | HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs Main Conference daoxuan xu William & Mary, Ying Li William & Mary, Yuwei Sun UIUC, Jie Ren William & Mary, Yifan Sun William&Mary | ||
15:10 20mTalk | Pulse: Fine-Grained Hierarchical Hashing Index for Disaggregated Memory Main Conference Guangyang Deng Xiamen University, Zixiang Yu Xiamen University, Zhirong Shen Xiamen University, Qiangsheng Su Xiamen University, Jiwu Shu Xiamen University | ||