HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026

This program is tentative and subject to change.

Wed 4 Feb 2026 11:50 - 12:10 at Coogee - Efficient Serving and Resource Management

Microservices suffer from the execution of auxiliary operations known as datacenter tax. Such operations include RPC and TCP processing, data serialization/deserialization, encryption/decryption, and compression/decompression. To minimize tax, numerous individual hardware accelerators have been proposed. Further, for environments with monolithic applications, researchers have proposed frameworks that orchestrate multiple accelerators integrated within the server package.

In this paper, we characterize the needs of orchestrating an ensemble of on-package accelerators in microservice environments and observe that such orchestration frameworks need to be highly dynamic and nimble: the sequences of accelerators needed vary across invocations of the same service, and the basic operations to be accelerated are fine grained and may take only tens of microseconds. Moreover, the sequence of accelerators to use is often affected by “branch conditions,” whose real-time resolution determines the set of subsequent accelerators needed. To address these challenges, this paper presents AccelFlow, the first orchestration framework for on-package accelerators running microservice workloads. In AccelFlow, CPU cores build software structures called Traces that contain sequences of accelerators to call. A CPU core enqueues a trace in an accelerator in user mode and, from then on, the accelerators in the trace execute in sequence without CPU involvement. A trace can include branch conditions whose outcomes determine the control flow inside the trace. It may also contain the address of the next trace to execute after the current one completes. We evaluate AccelFlow with full-system simulations and show that it is very effective. Compared to state-of-the-art accelerator orchestrators, AccelFlow on average reduces P99 tail latency by 70%, reduces average latency by 38%, and increases throughput by 120%.

This program is tentative and subject to change.

Wed 4 Feb

Displayed time zone: Hobart change

11:30 - 12:50
Efficient Serving and Resource ManagementMain Conference at Coogee
11:30
20m
Talk
Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates
Main Conference
Wenjun Yu Hong Kong Baptist University, Sitian Chen Hong Kong Baptist University, Amelie Chi Zhou Hong Kong Baptist University, Cheng Chen ByteDance, China
11:50
20m
Talk
AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices
Main Conference
Jovan Stojkovic University of Illinois at Urbana-Champaign, Abraham Farrell University of Illinois Urbana-Champaign, Zhangxiaowen Gong Intel, Christopher J. Hughes Intel, Josep Torrellas University of Illinois at Urbana-Champaign
12:10
20m
Talk
SpotCC: Facilitating Coded Computation for Prediction Serving Systems on Spot Instances
Main Conference
Lin Wang , Yuchong Hu Huazhong University of Science and Technology, Ziling Duan Huazhong University of Science and Technology, Mingqi Li Huazhong University of Science and Technology, Chenxuan Yao Huazhong University of Science and Technology, feifanliu Huazhong University of Science and Technology, Xiaolu Li Huazhong University of Science and Technology, Leihua Qin Huazhong University of Science and Technology, Dan Feng Huazhong University of Science and Technology, China
12:30
20m
Talk
LowCarb: Carbon-Aware Scheduling of Serverless Functions
Main Conference
Rohan Basu Roy University of Utah, Devesh Tiwari Northeastern University