HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Mon 2 Feb 2026 10:10 - 10:30 at Cronulla - Homomorphic Encryption Acceleration Chair(s): Jung Ho Ahn

Homomorphic Encryption (HE) provides strong data privacy for cloud services but at the cost of prohibitive computational overhead. While GPUs have emerged as a practical platform for accelerating HE, there remains an order-of-magnitude energy-efficiency gap compared to specialized (but expensive) HE ASICs.

This paper explores an alternate direction: leveraging existing AI accelerators, like Google’s TPUs with coarse-grained compute and memory architectures, to offer a path toward ASIC-level energy efficiency for HE. However, this architectural paradigm creates a fundamental mismatch with SoTA HE algorithms designed for GPUs. These algorithms rely heavily on: (1) high-precision (32-bit) integer arithmetic to now run on a TPU’s low-throughput vector unit, leaving its high-throughput low-precision (8-bit) matrix engine (MXU) idle, and (2) fine-grained data permutations that are inefficient on the TPU’s coarse-grained memory subsystem. Consequently, porting GPU-optimized HE libraries to TPUs results in severe resource under-utilization and performance degradation.

To tackle above challenges, we introduce CROSS, a compiler framework that systematically transforms HE workloads to align with the TPU’s architecture. CROSS makes two key contributions: (1) Basis-Aligned Transformation (BAT), a novel technique that converts high-precision modular arithmetic into dense, low-precision (INT8) matrix multiplications, unlocking and improving the utilization of TPU’s MXU for HE, and (2) Memory-Aligned Transformation (MAT), which eliminates costly runtime data reordering by embedding reordering into compute kernels through offline parameter transformation.

Our evaluation on a real single-host Google TPU v6e refreshes the SoTA Number Theoretic Transform (NTT) throughput record with up-to 1.43x throughput improvement over WarpDrive on a NVIDIA A100. Furthermore, CROSS achieves 451x, 7.81x, 1.83x, 1.31x, 1.86x, and 1.15x higher throughput per watt than OpenFHE, WarpDrive, FIDESlib, FAB, HEAP, and Cheddar, respectively, establishing AI ASIC as the SotA efficient platform for HE operators. Code: https://github.com/EfficientPPML/CROSS.

Mon 2 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Homomorphic Encryption AccelerationMain Conference at Cronulla
Chair(s): Jung Ho Ahn Seoul National University
09:50
20m
Talk
UniFHE: Faster Accelerator for FHE with Diverse Algebraic Structure and Balanced Memory System
Main Conference
Qingyun Niu Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS and School of Cyber Security, University of Chinese Academy of Sciences, Lutan Zhao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Ming Cai Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS and School of Cyber Security, University of Chinese Academy of Sciences, kai li Institute of Information Engineering,CAS, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Rui Hou Institute of Information Engineering, CAS
10:10
20m
Talk
Leveraging ASIC AI Chips for Homomorphic Encryption
Main Conference
Jianming Tong Georgia Institute of Technology, Tianhao Huang MIT, Leo de Castro MIT, Anirudh Itagi Georgia Institute of Technology, Jingtian Dang Georgia Tech, Anupam Golder Georgia Institute of Technology, Asra Ali Google, Jevin Jiang Google, Jeremy Kun Google, Arvind Massachusetts Institute of Technology, G. Edward Suh Cornell University, USA, Tushar Krishna Georgia Institute of Technology
Pre-print
10:30
20m
Talk
CROPHE: Cross-Operator Dataflow Optimization for Fully Homomorphic Encryption Accelerators
Main Conference
Xinhua Chen Fudan University, Jiangbin Dong Xi'an Jiaotong University, Hongren Zheng Tsinghua University, Tian Tang Tsinghua University, Mingyu Gao Tsinghua University
10:50
20m
Talk
Peregrine: Accelerating TFHE Bootstrapping on GPUs via Multi-Level External Product Co-Design
Main Conference
Haoqi He State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, Chinese Academy of Sciences and School of Cyber Security, University of Chinese Academy of Sciences, Zhiwei Wang State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Lutan Zhao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Dian Jiao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Rui Hou Institute of Information Engineering, CAS