Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation
Conventional heterogeneous computing systems built on PCIe interconnects suffer from inefficient fine-grained host-device interactions and complex programming models. In recent years, many proprietary and open cache-coherent interconnect standards have emerged, among which compute express link (CXL) prevails in the open-standard domain after acquiring several competing solutions. Although CXL-based coherent heterogeneous computing holds the potential to fundamentally transform the collaborative computing mode of CPUs and XPUs, research in this direction remains hampered by the scarcity of available CXL-supported platforms, immature software/hardware ecosystems, and unclear application prospects. This paper presents Cohet, the first CXL-driven coherent heterogeneous computing framework. Cohet decouples the compute and memory resources to form unbiased CPU and XPU pools which share a single unified and coherent memory pool. It exposes a standard malloc/mmap interface to both CPU and XPU compute threads, which share a single per-process page table for user applications, leaving the OS dealing with smart memory allocation, page auto-migration, and management of heterogeneous resources. This design significantly simplifies heterogeneous parallel programming to a level comparable to homogeneous programming. To facilitate Cohet research, we also present a full-system cycle-level simulator named CXLSim, which is capable of modeling all CXL sub-protocols and device types. CXLSim has been rigorously calibrated against a real CXL testbed with various CXL memory and accelerators, showing an average simulation error of 3%. Our evaluation reveals that CXL.cache reduces latency by 68% and increases bandwidth by 14.4$\times$ compared to DMA transfers at cacheline granularity. Building upon these insights, we demonstrate the benefits of Cohet with two killer apps which are remote atomic operation (RAO) and remote procedure call (RPC). Compared to PCIe-NIC design, CXL-NIC achieves a 5.5 to 40.2$\times$ speedup for RAO offloading and an average speedup of 1.86$\times$ for RPC (de)serialization offloading.
Mon 2 FebDisplayed time zone: Hobart change
09:50 - 11:10 | Cache Coherence and Chiplet InterconnectsMain Conference at Collaroy Chair(s): Alberto Ros University of Murcia | ||
09:50 20mTalk | $C^3$ : CXL Coherence Controllers for Heterogeneous Architectures Main Conference Anatole Lefort Technical University of Munich (TUM), David Schall Technical University of Munich, Nicolò Carpentieri Technical University of Munich, Julian Pritzi Technical University of Munich, Soham Chakraborty TU Delft, Nicolai Oswald NVIDIA, Pramod Bhatotia TU Munich Pre-print | ||
10:10 20mTalk | Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation Main Conference Yanjing Wang National University of Defense Technology, Lizhou Wu National University of Defense Technology, Sunfeng Gao National University of Defense Technology, Yibo Tang National University of Defense Technology, Junhui Luo National University of Defense Technology, Zicong Wang National University of Defense Technology, Yang Ou National University of Defense Technology, Dezun Dong NUDT, Nong Xiao National University of Defense Technology & Sun Yat-sen University, Mingche Lai National University of Defense Technology | ||
10:30 20mTalk | PhasedStore: Supporting High-performance Write-through Cache-coherence Protocols under TSO Main Conference Burak Ocalan University of Illinois Urbana-Champaign, Chloe Alverti University of Illinois at Urbana-Champaign, Shashwat Jaiswal University of Illinois Urbana-Champaign, USA, Antonis Psistakis University of Illinois Urbana-Champaign, David Koufaty Unaffiliated, Suyash Mahar UC San Diego, Steven Swanson University of California San Diego, Josep Torrellas University of Illinois at Urbana-Champaign | ||
10:50 20mTalk | Deadlock-Free Bridge Module for Inter-Chiplet Communication in Open Chiplet Ecosystem Main Conference Zhiqiang Chen National University of Defense Technology, Wenwen Fu National University of Defense Technology, Yongwen Wang National University of Defense Technology, Hongwei Zhou National University of Defense Technology | ||