An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation
Fully Homomorphic Encryption (FHE) has become a promising approach to protecting data privacy in emerging application scenarios. Unfortunately, FHE suffers from significant processing speed degradation compared to plaintext computation, with one of the primary bottlenecks being the time-consuming Number Theoretic Transform (NTT). Therefore, accelerating NTT to accommodate various FHE parameters is crucial to advancing FHE towards practical use. With highly reconfigurable and performant logical fabrics, Field Programmable Gate Arrays (FPGAs) have exhibited great potential in NTT acceleration.
By decomposing large-point NTT with strong data dependency into independent and simple small-point NTTs, the emerging Ten-step NTT (TNTT) algorithm intuitively enables higher parallelism and thereby have the potential to explore better performance compared to traditional algorithm. However, our quantitative analysis reveals that TNTT exhibits significantly performance degradation as parallelism increases due to additional varying-size transpositions and Hadamard products.
This paper proposes AutoNest, an efficient and scalable hardware architecture, along with an accelerator auto-generation framework for TNTT. The proposed hardware architecture maximizes performance by 1) adopting a 2D block decomposition dataflow to address critical path delays in transpose logic, thereby improving clock frequency. 2) integrating algorithm-level cost-free twiddle factor fusion to reduce the number of modular multiplications in Hadamard products, thereby allowing higher parallelism on chip. Moreover, we also deliver an accelerator generation framework conducting automated design space exploration to elaborate a performant TNTT architecture under the target FPGAs’ resource budget for user-defined FHE parameters. Experimental results on the AMD-Xilinx U280 FPGA demonstrate that NTT accelerators generated by AutoNest achieve an average speedup of 2.31× compared to prior designs.
Tue 3 FebDisplayed time zone: Hobart change
11:30 - 12:50 | Zero-Knowledge and Private Information RetrievalMain Conference at Cronulla Chair(s): Hanjun Kim POSTECH | ||
11:30 20mTalk | zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates Main Conference Alhad Daftardar New York University, Jianqiao Cambridge Mo New York University, Joey Ah-kiow New York University, Benedikt Bünz New York University, Siddharth Garg New York University, Brandon Reagen New York University | ||
11:50 20mTalk | Conflux: A High-Performance Keyword Private Retrieval System for Dynamic Datasets Main Conference Zehao Chen Shandong University, Zhaoyan Shen Shandong University, Qian Wei Shandong University, Hang Lu Institute of Computing Technology, Chinese Academy of Sciences, Lei Ju Shandong University | ||
12:10 20mTalk | An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation Main Conference Yilan Zhu Ant Group, Geng Yang Ant Group, Xingyu Tian Simon Fraser University, Dilshan Kumarathunga Simon Fraser University, Liang Kong Ant Group, Xianglong Deng UCAS, Shengyu Fan UCAS, Guang Fan Ant Group, Guiming Shi Tsinghua University, Lei Chen University of Chinese Academy of Sciences, Bo Zhang Ant Group, Yisong Chang Ant Group, Shoumeng Yan Ant Group, Zhenman Fang Simon Fraser University, Mingzhe Zhang Ant Group | ||
12:30 20mTalk | IVE: An Accelerator for Single-Server Private Information Retrieval Using a Versatile Processing Element Main Conference Sangpyo Kim Seoul National University, Hyesung Ji Seoul National University, Jongmin Kim Seoul National University, Jaiyoung Park Seoul National University, Wonseok Choi Seoul National University, Jung Ho Ahn Seoul National University Pre-print | ||