PhasedStore: Supporting High-performance Write-through Cache-coherence Protocols under TSO
Current multiprocessors that support the total store order (TSO) memory consistency model invariably use write-back (WB) cache-coherence protocols. When their hardware needs to issue write-through (WT) stores as in uncached operations, they deliver dismal performance: writes to main memory have to be fully serialized, often forcing the program to observe the full latency of a round-trip to memory.
To solve this problem, this paper presents a novel architecture that supports high-performance cache-coherent WT stores under TSO. The architecture, called PhasedStore, involves extending the store queue of the core and the directory. Individual WT stores in two phases, which allow them to fully overlap with other stores and still satisfy TSO.
PhasedStore is useful in environments that require a WT cache-coherence protocol. This can be the case in resilience-critical platforms where node failures should not cause the loss of shared program state, or platforms with CPUs and accelerators where programs follow a producer-consumer pattern. This paper evaluates PhasedStore in the first environment, namely a CXL-based distributed shared-memory platform where shared data in the program uses a WT protocol to enable recovery. Our evaluation shows that PhasedStore is very effective. Compared to using the conventional approach to implement WT under TSO, PhasedStore reduces the average execution time of a set of parallel applications by 1.88x.
Mon 2 FebDisplayed time zone: Hobart change
09:50 - 11:10 | Cache Coherence and Chiplet InterconnectsMain Conference at Collaroy Chair(s): Alberto Ros University of Murcia | ||
09:50 20mTalk | $C^3$ : CXL Coherence Controllers for Heterogeneous Architectures Main Conference Anatole Lefort Technical University of Munich (TUM), David Schall Technical University of Munich, Nicolò Carpentieri Technical University of Munich, Julian Pritzi Technical University of Munich, Soham Chakraborty TU Delft, Nicolai Oswald NVIDIA, Pramod Bhatotia TU Munich Pre-print | ||
10:10 20mTalk | Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation Main Conference Yanjing Wang National University of Defense Technology, Lizhou Wu National University of Defense Technology, Sunfeng Gao National University of Defense Technology, Yibo Tang National University of Defense Technology, Junhui Luo National University of Defense Technology, Zicong Wang National University of Defense Technology, Yang Ou National University of Defense Technology, Dezun Dong NUDT, Nong Xiao National University of Defense Technology & Sun Yat-sen University, Mingche Lai National University of Defense Technology | ||
10:30 20mTalk | PhasedStore: Supporting High-performance Write-through Cache-coherence Protocols under TSO Main Conference Burak Ocalan University of Illinois Urbana-Champaign, Chloe Alverti University of Illinois at Urbana-Champaign, Shashwat Jaiswal University of Illinois Urbana-Champaign, USA, Antonis Psistakis University of Illinois Urbana-Champaign, David Koufaty Unaffiliated, Suyash Mahar UC San Diego, Steven Swanson University of California San Diego, Josep Torrellas University of Illinois at Urbana-Champaign | ||
10:50 20mTalk | Deadlock-Free Bridge Module for Inter-Chiplet Communication in Open Chiplet Ecosystem Main Conference Zhiqiang Chen National University of Defense Technology, Wenwen Fu National University of Defense Technology, Yongwen Wang National University of Defense Technology, Hongwei Zhou National University of Defense Technology | ||