Uni-STC: Unified Sparse Tensor Core
This program is tentative and subject to change.
Modern processors are increasingly adopting tensor cores as key computational units. Compared to existing designs for dense and structured sparsity, recent dual-side sparse tensor cores support general sparsity. However, existing methods still face limitations on genericity (incomplete sparse kernel support prevents broad applicability) and performance (outer-product/row-row schemes yield unsatisfactory hardware utilisation, data reuse, and energy efficiency).
In this paper, we propose Uni-STC, a unified sparse tensor core that delivers high-performance dataflows for four key sparse kernels: sparse matrix-vector multiplication (SpMV), sparse matrix-sparse vector multiplication (SpMSpV), sparse matrix-multiple vector multiplication (SpMM), and sparse general matrix-matrix multiplication (SpGEMM). To efficiently support these diverse sparse workloads, we introduce BBC, a unified sparse format co-designed with Uni-STC’s dataflow. We then design Uni-STC’s architecture supporting (1) fine-grained task partitioning to improve resource utilization, (2) parallel sparse-tile processing for enhanced data reuse, and (3) a dynamic network to reduce intermediate data movement and energy consumption. Evaluated across 2893 SuiteSparse and 302 DLMC matrices, Uni-STC demonstrates significant improvements over state-of-the-art sparse tensor cores in both performance and energy efficiency.
This program is tentative and subject to change.
Tue 3 FebDisplayed time zone: Hobart change
15:50 - 17:10 | |||
15:50 20mTalk | Uni-STC: Unified Sparse Tensor Core Main Conference Haocheng Lian China University of Petroleum-Beijing, Qiyue Zhang China University of Petroleum-Beijing, Xinran Zhao China University of Petroleum-Beijing, Meichen Dong China University of Petroleum-Beijing, Yijie Nie China University of Petroleum-Beijing, Zhengyi Zhao China University of Petroleum-Beijing, Junzhong Shen National University of Defense Technology, Wei Guo National University of Defense Technology, Chun Huang National University of Defense Technology, Bingcai Sui National University of Defense Technology, Weifeng Liu China University of Petroleum-Beijing | ||
16:10 20mTalk | AUM: Unleashing the Efficiency Potential of Shared Processors with Accelerator Units for LLM Serving Main Conference Xinkai Wang Shanghai Jiao Tong University, Chao Li Shanghai Jiao Tong University, Yiming Zhuansun Shanghai Jiao Tong University, Jinyang Guo Shanghai Jiao Tong University, Xiaofeng Hou Shanghai Jiao Tong University, Jing Wang Shanghai Jiao Tong University, Luping Wang Alibaba Group, Weigao Chen Alibaba Group, Cheng Huang Alibaba Group, Guodong Yang Alibaba Group, Liping Zhang Alibaba Group, Minyi Guo Shanghai Jiao Tong University | ||
16:30 20mTalk | DRACO: A Hardware-Efficient Robot Rigid Body Dynamics Accelerator with Precision-Aware Quantization Framework Main Conference Xingyu Liu The Hong Kong University of Science and Technology, Jiawei Liang The Hong Kong University of Science and Technology, Yipu Zhang The Hong Kong University of Science and Technology, Linfeng Du The Hong Kong University of Science and Technology, Chaofang Ma The Hong Kong University of Science and Technology, Hui Yu Hong Kong University of Science and Technology, Xu Jiang University of Electronic Science and Technology of China, Wei Zhang The Hong Kong University of Science and Technology | ||
16:50 20mTalk | REASON: Accelerating Probabilistic Logical Reasoning for Neuro-Symbolic Cognitive Intelligence Main Conference Zishen Wan Georgia Institute of Technology, Che-Kai Liu Georgia Institute of Technology, Jiayi Qian Georgia Institute of Technology, Hanchen Yang Georgia Institute of Technology, Arijit Raychowdhury Georgia Institute of Technology, Tushar Krishna Georgia Institute of Technology | ||