Characterizing Cloud-Native LLM Inference at ByteDance and Exposing Optimization Challenges and Opportunities for Future AI Accelerators (HPCA 2026 - Industry Track)

Who

Jingwei Cai, Dehao Kong, Huang Hantao, Zishan Jiang, Zixuan Ma, Qingyu Guo, Zhenxing Zhang, Guiming Shi, Mingyu Gao, Kaisheng Ma, Minghui Yu

Track

HPCA 2026 Industry Track

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 17:35 - 17:55 at Coogee - Industry Track Chair(s): Pradip Bose

Abstract

As a major provider of LLM inference services, ByteDance has continuously explored diverse accelerator options to meet the rapidly growing inference demands of various heterogeneous LLM scenarios with higher cost-effectiveness, thereby enabling LLMs to serve more people worldwide. However, during this process, we have found that the complexity and opacity of cloud scenarios and corresponding cloud accelerators make it difficult for academia and many innovative chip startups to fully understand the real demands and challenges of these scenarios, which in turn severely restricts innovation and application potential in this field.

To bridge this gap, we first present and analyze the data and characteristics of the ByteDance Doubao LLM app across multiple dimensions, helping the community understand real-world cloud scenarios, and detail the challenges and opportunities we have identified. Second, we propose and plan to open-source our multi-level evaluation framework, ByteMLPerf, which includes benchmarks spanning instructions, operators, and models. This framework improves interpretability and trustworthiness, and helps promising new accelerator architectures gain wider adoption and development. Finally, we present comparative results of four accelerators currently deployed at scale, summarize their shortcomings and challenges, conduct in-depth analysis, and highlight numerous architectural and scheduling innovation opportunities we have observed.

Jingwei Cai

ByteDance Seed

Dehao Kong

Huang Hantao

ByteDance Seed

Zishan Jiang

ByteDance Seed

Zixuan Ma

ByteDance Seed

Qingyu Guo

ByteDance Seed

Zhenxing Zhang

ByteDance Seed

Guiming Shi

Tsinghua University

Mingyu Gao

Tsinghua University

Kaisheng Ma

Tsinghua University

Minghui Yu

ByteDance Seed

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

17:15 - 18:15	Industry TrackIndustry Track at Coogee Chair(s): Pradip Bose IBM

17:15 20m Industry talk		Enterprise Class On-Chip Accelerator Integration Industry Track Deanna Berger IBM, Alper Buyuktosunoglu IBM Research, Craig Walters IBM, Robert Sonnelitter IBM, Hailey Nicholson IBM, Ashraf ElSharif IBM, Yamil Rivera IBM, Avery Francois IBM, Cedric Lichtenau IBM, Jason Kohl IBM
17:35 20m Industry talk		Characterizing Cloud-Native LLM Inference at ByteDance and Exposing Optimization Challenges and Opportunities for Future AI Accelerators Industry Track Jingwei Cai ByteDance Seed, Dehao Kong , Huang Hantao ByteDance Seed, Zishan Jiang ByteDance Seed, Zixuan Ma ByteDance Seed, Qingyu Guo ByteDance Seed, Zhenxing Zhang ByteDance Seed, Guiming Shi Tsinghua University, Mingyu Gao Tsinghua University, Kaisheng Ma Tsinghua University, Minghui Yu ByteDance Seed
17:55 20m Industry talk		eGPU: Production-Scale Elastic Sharing over 10,000 GPUs Industry Track Xiaochuan Tang Alibaba Group, Hao Qi , Jianbo Dong Alibaba Group, Yinghao Yu Alibaba Group, Zhennan Xue Alibaba Group, Zhengyu Zhang Alibaba Group, Daocheng Ying Alibaba Group, Zheng Cao Alibaba Group, Xiaoyi Lu UC Merced