ESTroM: Element-Flow Architecture For Processing Sparse Tractable Probabilistic Models (HPCA 2026 - Main Conference)

Who

anjunyi fan, Xuejie Liu, Anji Liu, Qiuping Wu, Jiaqi Yang, Yuchao Qin, Guy Van den Broeck, Yitao Liang, Bonan Yan

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 14:50 - 15:10 at Cronulla - Emerging Compute Paradigms Chair(s): Calin Cascaval

Abstract

Probabilistic Circuits (PCs) models are emerging popular tractable probabilistic models. Their internal connections are represented in the form of directed acyclic graphs (DAGs) with sum nodes and product nodes, ensuring their internal parameter efficiency and model expressiveness in terms of probabilistic inference. Despite these algorithmic advantages, executing PC still faces graph structure deployment issues. PyJuice on GPU with the block-sparse parallel computation methods causes a parallelism-sparsity gap, while DAG-style processing does not take advantage of the repetitive characteristics of PC internal nodes, resulting in low throughput. To address this challenge, this work proposes the ESTroM, an efficient architecture that provides novel graph-element (nodes/edges) parallelism with sparsity-aware compilation. Through analysis of the sum/product node computational requirements, ESTroM core uses compressed matrices for sum/product nodes DAG representations, edge-based dataflow for product node processing, and node-based dataflow for sum node processing. With intra-core rewind and inter-core multicast optimizations, we develop a prototype ESTroM chip and a demonstrative system for a PC-based neural lossless compression application. Our ablation experiments show ESTroM offers a speed improvement of 1.75$\sim$3.56$\times$ compared to the state-of-the-art DAG processing unit (DPU)-v2 with the same computing resources. Under various typical PC structures, ESTroM achieves a speedup of 18.7$\times$ compared to DPU-v2 and 3.9$\times$ compared to NVIDIA RTX 4090 GPU with PyJuice framework. In terms of neural lossless compression, ESTroM demonstrates a 1.39$\times$ improvement in compression ratio compared to the industrial-standard Z-standard (Zstd) algorithms with the highest compression level, while offering 16.3$\sim$65.2$\times$ improvement in compression speed compared to Zstd on Intel Xeon Gold 6230. In a nutshell, this work develops novel graph element parallelism and element-flow architecture theory with practical prototype chips and systems, revealing a new hardware-perspective path for the ``scaling law'' of emerging tractable probabilistic models.

anjunyi fan

Peking University

Xuejie Liu

Peking University

Anji Liu

University of California, Los Angeles

Qiuping Wu

Peking University

Jiaqi Yang

Peking University

Yuchao Qin

Peking University

Guy Van den Broeck

University of California at Los Angeles

Yitao Liang

Peking University

Bonan Yan

Peking University

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

14:10 - 15:30	Emerging Compute ParadigmsMain Conference at Cronulla Chair(s): Calin Cascaval Google DeepMind

14:10 20m Talk		BASES: Enabling Energy-Efficient and Error-Resilient Analog CIM Acceleration via Reformation of Coding Bases Main Conference hongrui guo Institute of Computing Technology, Chinese Academy of Sciences, Tianrui Ma Institute of Computing Technology, Chinese Academy of Sciences, zidong du Institute of Computing Technology, Chinese Academy of Sciences, Mo Zou Institute of Computing Technology, Chinese Academy of Sciences, Yifan Hao ICT, Chinese Academy of Sciences, Yongwei Zhao Institute of Computing Technology, Chinese Academy of Sciences, Rui Zhang Chinese Academy of Sciences, Wei Li Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences, Xing Hu Institute of Computing Technology, Chinese Academy of Sciences, Zhiwei Xu Institute of Computing Technology of the Chinese Academy of Sciences, China, Qi Guo Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies
14:30 20m Talk		A PN-Free Digital SAT Accelerator Using Crossbar Architecture and Frequency-Controlled Counters Main Conference Zhezheng Ren University of Waterloo, Chenao Yuan University of Waterloo, Yuke Zhang University of Toronto, Shiyu Su University of Waterloo
14:50 20m Talk		ESTroM: Element-Flow Architecture For Processing Sparse Tractable Probabilistic Models Main Conference anjunyi fan Peking University, Xuejie Liu Peking University, Anji Liu University of California, Los Angeles, Qiuping Wu Peking University, Jiaqi Yang Peking University, Yuchao Qin Peking University, Guy Van den Broeck University of California at Los Angeles, Yitao Liang Peking University, Bonan Yan Peking University
15:10 20m Talk		GustavSNN: Unleashing the Power of Gustavson's Algorithm on SNN Acceleration with Column-Parallel Tick-Batch Dataflow Main Conference Sangwoo Hwang Korea University, Donghun Lee Korea University, Jahyun Koo DGIST, Jaeha Kung Korea University