SFD: Towards Segment Fusion Dataflow for Spatial Accelerators (HPCA 2026 - Main Conference)

Who

Fuyu Wang, Minghua Shen, Yufei Ding, Nong Xiao, Yutong Lu

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 11:50 - 12:10 at Coogee - Visual and Multimodal Acceleration Chair(s): Yu Feng

Abstract

Spatial accelerators are promising to satiate the growing demands for performance and energy efficiency in deep neural networks (DNNs). Due to the speed gap between on-chip compute cores and off-chip memory bandwidth, common DNNs suffer from poor operational intensity and are increasingly memory-bound. While operator fusion has shown potential in alleviating this bottleneck, existing approaches suffer from two key limitations. They rely on \textit{predefined fusion templates before tensor mapping} and impose \textit{tile constraints during mapping}. As a result, they overlook the potential of fusing more operators and lead to sub-optimal performance.

In this paper, we propose a segment fusion dataflow optimization framework called SFD. Central to this framework is the dataflow abstraction that enables \textit{template-free operator fusion after mapping} and supports \textit{tile constraint relaxation through tile scheduling}. Based on this abstraction, we first introduce a memory-centric mapper, which defines a design space and incorporates an algorithm to facilitate design space exploration (DSE). Then we propose an analytical network segmenter, which leverages mapping results to analyze tensor lifetimes and on-chip memory usage, fusing operators into variable-length segments. Finally, we introduce a dependency-aware tile scheduler, which develops a priority queue for each segment to ensure correct execution order. Extensive experiments with different DNNs demonstrate SFD achieves 1.4$\times$ to 2.2$\times$ speedup for spatial accelerators over state-of-the-art fusion frameworks.

Fuyu Wang

Sun Yat-sen University

Minghua Shen

Sun Yat-sen University

Yufei Ding

UCSD

Nong Xiao

National University of Defense Technology & Sun Yat-sen University

Yutong Lu

Sun Yat-sen University

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

11:30 - 12:50	Visual and Multimodal AccelerationMain Conference at Coogee Chair(s): Yu Feng Shanghai Jiao Tong University

11:30 20m Talk		V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval Main Conference Donghyuk Kim KAIST, Sejeong Yang KAIST, Wonjin Shin KAIST, Joo-Young Kim KAIST
11:50 20m Talk		SFD: Towards Segment Fusion Dataflow for Spatial Accelerators Main Conference Fuyu Wang Sun Yat-sen University, Minghua Shen Sun Yat-sen University, Yufei Ding UCSD, Nong Xiao National University of Defense Technology & Sun Yat-sen University, Yutong Lu Sun Yat-sen University
12:10 20m Talk		VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models through Dual Redundancy Main Conference Xujiang Xiang The Hong Kong University of Science and Technology, Fengbin Tu The Hong Kong University of Science and Technology
12:30 20m Talk		Cambricon-GS: An Accelerator for 3D Gaussian Splatting Training with Gaussian-Pixel Hybrid Parallelism Main Conference Rui Wen Institute of Computing Technology, Chinese Academy of Sciences, Zhifei Yue University of Science and Technology of China, Tianbo Liu University of Science and Technology of China, Xinkai Song Institute of Computing Technology, Chinese Academy of Sciences, Jin Li Institute of Computing Technology, Chinese Academy of Sciences, Di Huang Chinese Academy of Sciences, Institute of Computing Technology, Jiaming Guo Institute of Computing Technology, Chinese Academy of Sciences, Xing Hu Institute of Computing Technology, Chinese Academy of Sciences, zidong du Institute of Computing Technology, Chinese Academy of Sciences, Qi Guo Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies