Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates
Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRMs deploy decoupled training/inference clusters, where synchronizing petabyte-scale embedding tables (EMTs) causes multi-minute staleness, degrading recommendation quality and revenue. We observe that (1) inference nodes exhibit sustained CPU underutilization (peak $\leq$20%), and (2) EMT gradients possess intrinsic low-rank structure, enabling compact update representation. We present \sysname{}, a system that eliminates inter-cluster synchronization by co-locating Low-Rank Adaptation (LoRA) trainers within inference nodes. \sysname{} addresses two core challenges: (1) \textit{dynamic rank adaptation} via singular value monitoring to constrain memory overhead ($<$2% of EMTs), and (2) \textit{NUMA-aware resource scheduling} with hardware-enforced QoS to eliminate update-inference contention (P99 latency impact $<$10ms). Evaluations show \sysname{} reduces update costs by 2$\times$ versus delta-update baselines while achieving higher accuracy within 1-hour windows. By transforming idle inference resources into freshness engines, \sysname{} delivers online model updates while outperforming state-of-the-art methods by 1.41–2.44% in accuracy.
Wed 4 FebDisplayed time zone: Hobart change
11:30 - 12:50 | Efficient Serving and Resource ManagementMain Conference at Coogee Chair(s): Mohammad A. Islam University of Texas at Arlington | ||
11:30 20mTalk | Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates Main Conference Wenjun Yu Hong Kong Baptist University, Sitian Chen Hong Kong Baptist University, Amelie Chi Zhou Hong Kong Baptist University, Cheng Chen ByteDance, China | ||
11:50 20mTalk | AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices Main Conference Jovan Stojkovic University of Illinois at Urbana-Champaign, Abraham Farrell University of Illinois Urbana-Champaign, Zhangxiaowen Gong Intel, Christopher J. Hughes Intel, Josep Torrellas University of Illinois at Urbana-Champaign | ||
12:10 20mTalk | SpotCC: Facilitating Coded Computation for Prediction Serving Systems on Spot Instances Main Conference Lin Wang , Yuchong Hu Huazhong University of Science and Technology, Ziling Duan Huazhong University of Science and Technology, Mingqi Li Huazhong University of Science and Technology, Chenxuan Yao Huazhong University of Science and Technology, feifanliu Huazhong University of Science and Technology, Xiaolu Li Huazhong University of Science and Technology, Leihua Qin Huazhong University of Science and Technology, Dan Feng Huazhong University of Science and Technology, China | ||
12:30 20mTalk | LowCarb: Carbon-Aware Scheduling of Serverless Functions Main Conference | ||