VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models through Dual Redundancy
Image synthesis task has recently drawn enormous attention from both the academia and the industry due to the recent advancements of generative models, which now can generate photorealistic images with conditional words from human. Among the generative models for image synthesis task, Visual AutoRegressive (VAR) model is a promising avenue due to its strong scalability. Nevertheless, its exorbitant computational cost poses a formidable obstacle to widespread adoption. To this end, we propose a dedicated software/hardware co-design framework dubbed VAR-Turbo for unlocking the potential of VAR models. Specifically, in the software level, we propose a Draft-Free Parallel Decoding scheme by exploiting the Image Redundancy, which can decrease the sample steps by > 80%, and a combination of Token Aggregation and Dynamic Bypass that capitalizes on the Model Redundancy introduced by the generative transformer to reduce the computational load by > 60%. In the hardware level, we propose a dedicated accelerator featuring 1) A Unified Attention Core and 2) Radix Sort Core, which can support the aforementioned algorithm pipeline seamlessly and efficiently. Under the collaborative design and synergy of software and hardware, VAR-Turbo achieves, on average, 5047.4x, 210.3x, 6.1x, 3.8x speedups and 24818.2x, 423.5x, 6.0x, 7.8x energy-efficiency improvements over Xeon 8168 CPU, Nvidia V100, ViTCoD and AdapTiV, while maintaining the generation quality.
Tue 3 FebDisplayed time zone: Hobart change
11:30 - 12:50 | Visual and Multimodal AccelerationMain Conference at Coogee Chair(s): Yu Feng Shanghai Jiao Tong University | ||
11:30 20mTalk | V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval Main Conference | ||
11:50 20mTalk | SFD: Towards Segment Fusion Dataflow for Spatial Accelerators Main Conference Fuyu Wang Sun Yat-sen University, Minghua Shen Sun Yat-sen University, Yufei Ding UCSD, Nong Xiao National University of Defense Technology & Sun Yat-sen University, Yutong Lu Sun Yat-sen University | ||
12:10 20mTalk | VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models through Dual Redundancy Main Conference Xujiang Xiang The Hong Kong University of Science and Technology, Fengbin Tu The Hong Kong University of Science and Technology | ||
12:30 20mTalk | Cambricon-GS: An Accelerator for 3D Gaussian Splatting Training with Gaussian-Pixel Hybrid Parallelism Main Conference Rui Wen Institute of Computing Technology, Chinese Academy of Sciences, Zhifei Yue University of Science and Technology of China, Tianbo Liu University of Science and Technology of China, Xinkai Song Institute of Computing Technology, Chinese Academy of Sciences, Jin Li Institute of Computing Technology, Chinese Academy of Sciences, Di Huang Chinese Academy of Sciences, Institute of Computing Technology, Jiaming Guo Institute of Computing Technology, Chinese Academy of Sciences, Xing Hu Institute of Computing Technology, Chinese Academy of Sciences, zidong du Institute of Computing Technology, Chinese Academy of Sciences, Qi Guo Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies | ||