HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 15:10 - 15:30 at Collaroy - Memory System Reliability Chair(s): Haiyu Mao

Digital computing-in-memory (CIM) is a promising computing paradigm for neural networks (NN) acceleration. However, during the actual deployment process of digital CIM chips, we find that existing digital CIM designs face severe computing reliability issues, which is crucial for real product development but remains underexplored. Thus, this work pioneers a systematical computing reliability analysis for digital CIM across off-memory and in-memory levels. We find that both the off-memory floating-point (FP) exponent alignment and the in-memory random cell bit-flip errors impair digital CIM’s computing reliability, causing significant truncation and bit-flip accuracy loss. Critically, existing reliability solutions are incompatible with the unique multi-row accumulation structure of digital CIM, which either severely damage digital CIM’s performance or result in prohibitive overhead.

To address above challenges, we propose HR-DCIM: a high-reliability FP digital CIM architecture featuring unified low-cost iterative error correction. Specifically, for the off-memory reliability, we propose an exponent-mantissa joint-alignment mechanism to repurpose inherent invalid bits of aligned mantissas as compensation bits to reduce alignment truncation loss, without damaging digital CIM’s performance. Then, for the in-memory reliability, we propose a remainder aliasing-based unified multiply-accumulation (MAC) error correction mechanism to correct possible MAC errors caused by various cell error cases with low-cost iteration. Experimental results show that the proposed techniques enable digital CIM to maintain high performance and efficiency across various operating voltage conditions without significant accuracy loss.

Mon 2 Feb

Displayed time zone: Hobart change

14:10 - 15:30
Memory System ReliabilityMain Conference at Collaroy
Chair(s): Haiyu Mao King's College London
14:10
20m
Talk
Predicting DRAM Failures at Scale: A Two-Stage Approach for Heterogeneous Systems
Main Conference
Chenglin Wang Xiamen University, Shouxin Wang Xiamen University, Shuyue Zhou Xiamen University, Ronglong Wu Xiamen University, Zhirong Shen Xiamen University, Lu Tang Xiamen University, Yiming Zhang Xiamen University, Jialiang Yu Huawei, Min Zhou Huawei
14:30
20m
Talk
MemSOS: OS-Guided Selective Memory Mirroring
Main Conference
Junghoon Kim Seoul National University & Samsung Electronics, Jongheon Jeong Seoul National University, Seokwon Moon Seoul National University, Seong Hoon Seo Seoul National University, Yeonhong Park Seoul National University, Jinkyu Jeong Yonsei University, Nam Sung Kim UIUC, Jae W. Lee Seoul National University
14:50
20m
Talk
ASPA: Reassigning DDR5 Parity Bandwidth
Main Conference
Fan Li University of Central Florida, Qiufeng Li George Washington University, Yanan Guo University of Rochester, Weidong Cao George Washington University, Xin Xin University of Central Florida
15:10
20m
Talk
HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture with Unified Low-Cost Iterative Error Correction
Main Conference
Zhen He Tsinghua University, Yiqi Wang Tsinghua University, Zhiheng Yue Tsinghua University, Zihan Wu Tsinghua University, Huiming Han Tsinghua University, Shaojun Wei Tsinghua University, Yang Hu Tsinghua University, Fengbin Tu The Hong Kong University of Science and Technology, Shouyi Yin Tsinghua University