HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Tue 3 Feb 2026 12:10 - 12:30 at Cronulla - Zero-Knowledge and Private Information Retrieval Chair(s): Hanjun Kim

Fully Homomorphic Encryption (FHE) has become a promising approach to protecting data privacy in emerging application scenarios. Unfortunately, FHE suffers from significant processing speed degradation compared to plaintext computation, with one of the primary bottlenecks being the time-consuming Number Theoretic Transform (NTT). Therefore, accelerating NTT to accommodate various FHE parameters is crucial to advancing FHE towards practical use. With highly reconfigurable and performant logical fabrics, Field Programmable Gate Arrays (FPGAs) have exhibited great potential in NTT acceleration.

By decomposing large-point NTT with strong data dependency into independent and simple small-point NTTs, the emerging Ten-step NTT (TNTT) algorithm intuitively enables higher parallelism and thereby have the potential to explore better performance compared to traditional algorithm. However, our quantitative analysis reveals that TNTT exhibits significantly performance degradation as parallelism increases due to additional varying-size transpositions and Hadamard products.

This paper proposes AutoNest, an efficient and scalable hardware architecture, along with an accelerator auto-generation framework for TNTT. The proposed hardware architecture maximizes performance by 1) adopting a 2D block decomposition dataflow to address critical path delays in transpose logic, thereby improving clock frequency. 2) integrating algorithm-level cost-free twiddle factor fusion to reduce the number of modular multiplications in Hadamard products, thereby allowing higher parallelism on chip. Moreover, we also deliver an accelerator generation framework conducting automated design space exploration to elaborate a performant TNTT architecture under the target FPGAs’ resource budget for user-defined FHE parameters. Experimental results on the AMD-Xilinx U280 FPGA demonstrate that NTT accelerators generated by AutoNest achieve an average speedup of 2.31× compared to prior designs.

Tue 3 Feb

Displayed time zone: Hobart change

11:30 - 12:50
Zero-Knowledge and Private Information RetrievalMain Conference at Cronulla
Chair(s): Hanjun Kim POSTECH
11:30
20m
Talk
zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates
Main Conference
Alhad Daftardar New York University, Jianqiao Cambridge Mo New York University, Joey Ah-kiow New York University, Benedikt Bünz New York University, Siddharth Garg New York University, Brandon Reagen New York University
11:50
20m
Talk
Conflux: A High-Performance Keyword Private Retrieval System for Dynamic Datasets
Main Conference
Zehao Chen Shandong University, Zhaoyan Shen Shandong University, Qian Wei Shandong University, Hang Lu Institute of Computing Technology, Chinese Academy of Sciences, Lei Ju Shandong University
12:10
20m
Talk
An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation
Main Conference
Yilan Zhu Ant Group, Geng Yang Ant Group, Xingyu Tian Simon Fraser University, Dilshan Kumarathunga Simon Fraser University, Liang Kong Ant Group, Xianglong Deng UCAS, Shengyu Fan UCAS, Guang Fan Ant Group, Guiming Shi Tsinghua University, Lei Chen University of Chinese Academy of Sciences, Bo Zhang Ant Group, Yisong Chang Ant Group, Shoumeng Yan Ant Group, Zhenman Fang Simon Fraser University, Mingzhe Zhang Ant Group
12:30
20m
Talk
IVE: An Accelerator for Single-Server Private Information Retrieval Using a Versatile Processing Element
Main Conference
Sangpyo Kim Seoul National University, Hyesung Ji Seoul National University, Jongmin Kim Seoul National University, Jaiyoung Park Seoul National University, Wonseok Choi Seoul National University, Jung Ho Ahn Seoul National University
Pre-print