VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models through Dual Redundancy (HPCA 2026 - Main Conference)

Who

Xujiang Xiang, Fengbin Tu

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 12:10 - 12:30 at Coogee - Visual and Multimodal Acceleration Chair(s): Yu Feng

Abstract

Image synthesis task has recently drawn enormous attention from both the academia and the industry due to the recent advancements of generative models, which now can generate photorealistic images with conditional words from human. Among the generative models for image synthesis task, Visual AutoRegressive (VAR) model is a promising avenue due to its strong scalability. Nevertheless, its exorbitant computational cost poses a formidable obstacle to widespread adoption. To this end, we propose a dedicated software/hardware co-design framework dubbed VAR-Turbo for unlocking the potential of VAR models. Specifically, in the software level, we propose a Draft-Free Parallel Decoding scheme by exploiting the Image Redundancy, which can decrease the sample steps by > 80%, and a combination of Token Aggregation and Dynamic Bypass that capitalizes on the Model Redundancy introduced by the generative transformer to reduce the computational load by > 60%. In the hardware level, we propose a dedicated accelerator featuring 1) A Unified Attention Core and 2) Radix Sort Core, which can support the aforementioned algorithm pipeline seamlessly and efficiently. Under the collaborative design and synergy of software and hardware, VAR-Turbo achieves, on average, 5047.4x, 210.3x, 6.1x, 3.8x speedups and 24818.2x, 423.5x, 6.0x, 7.8x energy-efficiency improvements over Xeon 8168 CPU, Nvidia V100, ViTCoD and AdapTiV, while maintaining the generation quality.

Xujiang Xiang

The Hong Kong University of Science and Technology

Fengbin Tu