HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 14:30 - 14:50 at Collaroy - Memory System Reliability Chair(s): Haiyu Mao

Memory errors pose an escalating threat to datacenter reliability as DRAM technology scales to smaller nodes and servers process ever-larger datasets. While Error Correction Code (ECC) provides first-line defense, uncorrectable errors still cause catastrophic server failures with significant economic impact. Memory mirroring offers complementary protection against these errors, but existing memory mirroring solutions require reserving specific memory regions exclusively for mirroring, which incurs significant capacity overhead and thus limits wide adoption. A recent proposal suggests leveraging free memory space for mirroring, but it leaves a critical research question unanswered: which data to mirror when free memory is limited. Thus, we propose MemSOS, a selective memory mirroring system that dynamically chooses which pages to mirror based on their impact on system reliability. Specifically, MemSOS selects pages to mirror based on their criticality and recency. Criticality is evaluated by examining the page type, while recency serves as a proxy for the likelihood of future access. Our evaluation demonstrates that MemSOS reduces system Failures In Time (FIT) by up to 19,000$\times$ compared to a state-of-the-art partial mirroring scheme, while maintaining less than 3% performance overhead. In many cases, MemSOS achieves reliability levels comparable to full mirroring, underscoring its effectiveness in maximizing system availability under limited free memory space.

Mon 2 Feb

Displayed time zone: Hobart change

14:10 - 15:30
Memory System ReliabilityMain Conference at Collaroy
Chair(s): Haiyu Mao King's College London
14:10
20m
Talk
Predicting DRAM Failures at Scale: A Two-Stage Approach for Heterogeneous Systems
Main Conference
Chenglin Wang Xiamen University, Shouxin Wang Xiamen University, Shuyue Zhou Xiamen University, Ronglong Wu Xiamen University, Zhirong Shen Xiamen University, Lu Tang Xiamen University, Yiming Zhang Xiamen University, Jialiang Yu Huawei, Min Zhou Huawei
14:30
20m
Talk
MemSOS: OS-Guided Selective Memory Mirroring
Main Conference
Junghoon Kim Seoul National University & Samsung Electronics, Jongheon Jeong Seoul National University, Seokwon Moon Seoul National University, Seong Hoon Seo Seoul National University, Yeonhong Park Seoul National University, Jinkyu Jeong Yonsei University, Nam Sung Kim UIUC, Jae W. Lee Seoul National University
14:50
20m
Talk
ASPA: Reassigning DDR5 Parity Bandwidth
Main Conference
Fan Li University of Central Florida, Qiufeng Li George Washington University, Yanan Guo University of Rochester, Weidong Cao George Washington University, Xin Xin University of Central Florida
15:10
20m
Talk
HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture with Unified Low-Cost Iterative Error Correction
Main Conference
Zhen He Tsinghua University, Yiqi Wang Tsinghua University, Zhiheng Yue Tsinghua University, Zihan Wu Tsinghua University, Huiming Han Tsinghua University, Shaojun Wei Tsinghua University, Yang Hu Tsinghua University, Fengbin Tu The Hong Kong University of Science and Technology, Shouyi Yin Tsinghua University