Uni-STC: Unified Sparse Tensor Core (HPCA 2026 - Main Conference)

Who

Haocheng Lian, Qiyue Zhang, Xinran Zhao, Meichen Dong, Yijie Nie, Zhengyi Zhao, Junzhong Shen, Wei Guo, Chun Huang, Bingcai Sui, Weifeng Liu

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 15:50 - 16:10 at Cronulla - Domain Specific Accelerators Chair(s): Jaewoong Sim

Abstract

Modern processors are increasingly adopting tensor cores as key computational units. Compared to existing designs for dense and structured sparsity, recent dual-side sparse tensor cores support general sparsity. However, existing methods still face limitations on genericity (incomplete sparse kernel support prevents broad applicability) and performance (outer-product/row-row schemes yield unsatisfactory hardware utilisation, data reuse, and energy efficiency).

In this paper, we propose Uni-STC, a unified sparse tensor core that delivers high-performance dataflows for four key sparse kernels: sparse matrix-vector multiplication (SpMV), sparse matrix-sparse vector multiplication (SpMSpV), sparse matrix-multiple vector multiplication (SpMM), and sparse general matrix-matrix multiplication (SpGEMM). To efficiently support these diverse sparse workloads, we introduce BBC, a unified sparse format co-designed with Uni-STC’s dataflow. We then design Uni-STC’s architecture supporting (1) fine-grained task partitioning to improve resource utilization, (2) parallel sparse-tile processing for enhanced data reuse, and (3) a dynamic network to reduce intermediate data movement and energy consumption. Evaluated across 2893 SuiteSparse and 302 DLMC matrices, Uni-STC demonstrates significant improvements over state-of-the-art sparse tensor cores in both performance and energy efficiency.

Haocheng Lian

China University of Petroleum-Beijing

China

Qiyue Zhang

China University of Petroleum-Beijing

Xinran Zhao

China University of Petroleum-Beijing

Meichen Dong

China University of Petroleum-Beijing

Yijie Nie

China University of Petroleum-Beijing

Zhengyi Zhao

China University of Petroleum-Beijing

Junzhong Shen

National University of Defense Technology

Wei Guo

National University of Defense Technology

Chun Huang

National University of Defense Technology

Bingcai Sui

National University of Defense Technology

Weifeng Liu

China University of Petroleum-Beijing

China

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

15:50 - 17:10	Domain Specific AcceleratorsMain Conference at Cronulla Chair(s): Jaewoong Sim Seoul National University

15:50 20m Talk		Uni-STC: Unified Sparse Tensor Core Main Conference Haocheng Lian China University of Petroleum-Beijing, Qiyue Zhang China University of Petroleum-Beijing, Xinran Zhao China University of Petroleum-Beijing, Meichen Dong China University of Petroleum-Beijing, Yijie Nie China University of Petroleum-Beijing, Zhengyi Zhao China University of Petroleum-Beijing, Junzhong Shen National University of Defense Technology, Wei Guo National University of Defense Technology, Chun Huang National University of Defense Technology, Bingcai Sui National University of Defense Technology, Weifeng Liu China University of Petroleum-Beijing
16:10 20m Talk		AUM: Unleashing the Efficiency Potential of Shared Processors with Accelerator Units for LLM Serving Main Conference Xinkai Wang Shanghai Jiao Tong University, Chao Li Shanghai Jiao Tong University, Yiming Zhuansun Shanghai Jiao Tong University, Jinyang Guo Shanghai Jiao Tong University, Xiaofeng Hou Shanghai Jiao Tong University, Jing Wang Shanghai Jiao Tong University, Luping Wang Alibaba Group, Weigao Chen Alibaba Group, Cheng Huang Alibaba Group, Guodong Yang Alibaba Group, Liping Zhang Alibaba Group, Minyi Guo Shanghai Jiao Tong University
16:30 20m Talk		DRACO: A Hardware-Efficient Robot Rigid Body Dynamics Accelerator with Precision-Aware Quantization Framework Main Conference Xingyu Liu The Hong Kong University of Science and Technology, Jiawei Liang The Hong Kong University of Science and Technology, Yipu Zhang The Hong Kong University of Science and Technology, Linfeng Du The Hong Kong University of Science and Technology, Chaofang Ma The Hong Kong University of Science and Technology, Hui Yu Hong Kong University of Science and Technology, Xu Jiang University of Electronic Science and Technology of China, Wei Zhang The Hong Kong University of Science and Technology
16:50 20m Talk		REASON: Accelerating Probabilistic Logical Reasoning for Neuro-Symbolic Cognitive Intelligence Main Conference Zishen Wan Georgia Institute of Technology, Che-Kai Liu Georgia Institute of Technology, Jiayi Qian Georgia Institute of Technology, Hanchen Yang Georgia Institute of Technology, Arijit Raychowdhury Georgia Institute of Technology, Tushar Krishna Georgia Institute of Technology