Leveraging ASIC AI Chips for Homomorphic Encryption (HPCA 2026 - Main Conference)

Who

Jianming Tong, Tianhao Huang, Leo de Castro, Anirudh Itagi, Jingtian Dang, Anupam Golder, Asra Ali, Jevin Jiang, Jeremy Kun, Arvind, G. Edward Suh, Tushar Krishna

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 2 Feb 2026 10:10 - 10:30 at Cronulla - Homomorphic Encryption Acceleration Chair(s): Jung Ho Ahn

Abstract

Homomorphic Encryption (HE) provides strong data privacy for cloud services but at the cost of prohibitive computational overhead. While GPUs have emerged as a practical platform for accelerating HE, there remains an order-of-magnitude energy-efficiency gap compared to specialized (but expensive) HE ASICs.

This paper explores an alternate direction: leveraging existing AI accelerators, like Google’s TPUs with coarse-grained compute and memory architectures, to offer a path toward ASIC-level energy efficiency for HE. However, this architectural paradigm creates a fundamental mismatch with SoTA HE algorithms designed for GPUs. These algorithms rely heavily on: (1) high-precision (32-bit) integer arithmetic to now run on a TPU’s low-throughput vector unit, leaving its high-throughput low-precision (8-bit) matrix engine (MXU) idle, and (2) fine-grained data permutations that are inefficient on the TPU’s coarse-grained memory subsystem. Consequently, porting GPU-optimized HE libraries to TPUs results in severe resource under-utilization and performance degradation.

To tackle above challenges, we introduce CROSS, a compiler framework that systematically transforms HE workloads to align with the TPU’s architecture. CROSS makes two key contributions: (1) Basis-Aligned Transformation (BAT), a novel technique that converts high-precision modular arithmetic into dense, low-precision (INT8) matrix multiplications, unlocking and improving the utilization of TPU’s MXU for HE, and (2) Memory-Aligned Transformation (MAT), which eliminates costly runtime data reordering by embedding reordering into compute kernels through offline parameter transformation.

Our evaluation on a real single-host Google TPU v6e refreshes the SoTA Number Theoretic Transform (NTT) throughput record with up-to 1.43x throughput improvement over WarpDrive on a NVIDIA A100. Furthermore, CROSS achieves 451x, 7.81x, 1.83x, 1.31x, 1.86x, and 1.15x higher throughput per watt than OpenFHE, WarpDrive, FIDESlib, FAB, HEAP, and Cheddar, respectively, establishing AI ASIC as the SotA efficient platform for HE operators. Code: https://github.com/EfficientPPML/CROSS.

Link to Preprint

https://arxiv.org/abs/2501.07047

Jianming Tong

Georgia Institute of Technology

United States

Tianhao Huang

MIT

Leo de Castro

MIT

Anirudh Itagi

Georgia Institute of Technology

Jingtian Dang

Georgia Tech

Anupam Golder

Georgia Institute of Technology

Asra Ali

Google

Jevin Jiang

Google

Jeremy Kun

Google

Arvind

Massachusetts Institute of Technology

United States

G. Edward Suh

Cornell University, USA

Tushar Krishna

Georgia Institute of Technology

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 2 Feb
Displayed time zone: Hobart change

09:50 - 11:10	Homomorphic Encryption AccelerationMain Conference at Cronulla Chair(s): Jung Ho Ahn Seoul National University

09:50 20m Talk		UniFHE: Faster Accelerator for FHE with Diverse Algebraic Structure and Balanced Memory System Main Conference Qingyun Niu Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS and School of Cyber Security, University of Chinese Academy of Sciences, Lutan Zhao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Ming Cai Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS and School of Cyber Security, University of Chinese Academy of Sciences, kai li Institute of Information Engineering,CAS, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Rui Hou Institute of Information Engineering, CAS
10:10 20m Talk		Leveraging ASIC AI Chips for Homomorphic Encryption Main Conference Jianming Tong Georgia Institute of Technology, Tianhao Huang MIT, Leo de Castro MIT, Anirudh Itagi Georgia Institute of Technology, Jingtian Dang Georgia Tech, Anupam Golder Georgia Institute of Technology, Asra Ali Google, Jevin Jiang Google, Jeremy Kun Google, Arvind Massachusetts Institute of Technology, G. Edward Suh Cornell University, USA, Tushar Krishna Georgia Institute of Technology Pre-print
10:30 20m Talk		CROPHE: Cross-Operator Dataflow Optimization for Fully Homomorphic Encryption Accelerators Main Conference Xinhua Chen Fudan University, Jiangbin Dong Xi'an Jiaotong University, Hongren Zheng Tsinghua University, Tian Tang Tsinghua University, Mingyu Gao Tsinghua University
10:50 20m Talk		Peregrine: Accelerating TFHE Bootstrapping on GPUs via Multi-Level External Product Co-Design Main Conference Haoqi He State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, Chinese Academy of Sciences and School of Cyber Security, University of Chinese Academy of Sciences, Zhiwei Wang State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Lutan Zhao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Dian Jiao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Rui Hou Institute of Information Engineering, CAS