AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices (HPCA 2026 - Main Conference)

Who

Jovan Stojkovic, Abraham Farrell, Zhangxiaowen Gong, Christopher J. Hughes, Josep Torrellas

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 4 Feb 2026 11:50 - 12:10 at Coogee - Efficient Serving and Resource Management Chair(s): Mohammad A. Islam

Abstract

Microservices suffer from the execution of auxiliary operations known as datacenter tax. Such operations include RPC and TCP processing, data serialization/deserialization, encryption/decryption, and compression/decompression. To minimize tax, numerous individual hardware accelerators have been proposed. Further, for environments with monolithic applications, researchers have proposed frameworks that orchestrate multiple accelerators integrated within the server package.

In this paper, we characterize the needs of orchestrating an ensemble of on-package accelerators in microservice environments and observe that such orchestration frameworks need to be highly dynamic and nimble: the sequences of accelerators needed vary across invocations of the same service, and the basic operations to be accelerated are fine grained and may take only tens of microseconds. Moreover, the sequence of accelerators to use is often affected by “branch conditions,” whose real-time resolution determines the set of subsequent accelerators needed. To address these challenges, this paper presents AccelFlow, the first orchestration framework for on-package accelerators running microservice workloads. In AccelFlow, CPU cores build software structures called Traces that contain sequences of accelerators to call. A CPU core enqueues a trace in an accelerator in user mode and, from then on, the accelerators in the trace execute in sequence without CPU involvement. A trace can include branch conditions whose outcomes determine the control flow inside the trace. It may also contain the address of the next trace to execute after the current one completes. We evaluate AccelFlow with full-system simulations and show that it is very effective. Compared to state-of-the-art accelerator orchestrators, AccelFlow on average reduces P99 tail latency by 70%, reduces average latency by 38%, and increases throughput by 120%.

Jovan Stojkovic

University of Illinois at Urbana-Champaign

Abraham Farrell

University of Illinois Urbana-Champaign

Zhangxiaowen Gong

Intel

Christopher J. Hughes

Intel

Josep Torrellas