HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Wed 4 Feb 2026 11:30 - 11:50 at Coogee - Efficient Serving and Resource Management Chair(s): Mohammad A. Islam

Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRMs deploy decoupled training/inference clusters, where synchronizing petabyte-scale embedding tables (EMTs) causes multi-minute staleness, degrading recommendation quality and revenue. We observe that (1) inference nodes exhibit sustained CPU underutilization (peak $\leq$20%), and (2) EMT gradients possess intrinsic low-rank structure, enabling compact update representation. We present \sysname{}, a system that eliminates inter-cluster synchronization by co-locating Low-Rank Adaptation (LoRA) trainers within inference nodes. \sysname{} addresses two core challenges: (1) \textit{dynamic rank adaptation} via singular value monitoring to constrain memory overhead ($<$2% of EMTs), and (2) \textit{NUMA-aware resource scheduling} with hardware-enforced QoS to eliminate update-inference contention (P99 latency impact $<$10ms). Evaluations show \sysname{} reduces update costs by 2$\times$ versus delta-update baselines while achieving higher accuracy within 1-hour windows. By transforming idle inference resources into freshness engines, \sysname{} delivers online model updates while outperforming state-of-the-art methods by 1.41–2.44% in accuracy.

Wed 4 Feb

Displayed time zone: Hobart change

11:30 - 12:50
Efficient Serving and Resource ManagementMain Conference at Coogee
Chair(s): Mohammad A. Islam University of Texas at Arlington
11:30
20m
Talk
Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates
Main Conference
Wenjun Yu Hong Kong Baptist University, Sitian Chen Hong Kong Baptist University, Amelie Chi Zhou Hong Kong Baptist University, Cheng Chen ByteDance, China
11:50
20m
Talk
AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices
Main Conference
Jovan Stojkovic University of Illinois at Urbana-Champaign, Abraham Farrell University of Illinois Urbana-Champaign, Zhangxiaowen Gong Intel, Christopher J. Hughes Intel, Josep Torrellas University of Illinois at Urbana-Champaign
12:10
20m
Talk
SpotCC: Facilitating Coded Computation for Prediction Serving Systems on Spot Instances
Main Conference
Lin Wang , Yuchong Hu Huazhong University of Science and Technology, Ziling Duan Huazhong University of Science and Technology, Mingqi Li Huazhong University of Science and Technology, Chenxuan Yao Huazhong University of Science and Technology, feifanliu Huazhong University of Science and Technology, Xiaolu Li Huazhong University of Science and Technology, Leihua Qin Huazhong University of Science and Technology, Dan Feng Huazhong University of Science and Technology, China
12:30
20m
Talk
LowCarb: Carbon-Aware Scheduling of Serverless Functions
Main Conference
Rohan Basu Roy University of Utah, Devesh Tiwari Northeastern University