PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System
This program is tentative and subject to change.
The expansion of long-context Large Language Models (LLMs) creates significant memory system challenges. While Processing-in-Memory (PIM) is a promising accelerator, we identify that it suffers from critical inefficiencies when scaled to long contexts: severe channel underutilization, performance-limiting I/O bottlenecks, and massive memory waste from static KV-Cache management. In this work, we propose PIMphony, a PIM \emph{orchestrator} that systematically resolves these issues with three co-designed techniques. First, \textit{Token-Centric PIM Partitioning (TCP)} ensures high channel utilization regardless of batch size. Second, \textit{Dynamic PIM Command Scheduling (DCS)} mitigates the I/O bottleneck by overlapping data movement and computation. Finally, a \textit{Direct PIM Access (DPA)} controller enables dynamic memory management to eliminate static memory waste. Implemented via an MLIR-based compiler and evaluated on a cycle-accurate simulator, PIMphony significantly improves throughput for long-context LLM inference (up to 72B parameters and 1M context). Our evaluations show performance boosts of up to 11.3$\times$ on PIM-only systems and 8.4$\times$ on xPU+PIM systems, enabling more efficient deployment of LLMs in real-world long-context applications.
This program is tentative and subject to change.
Mon 2 FebDisplayed time zone: Hobart change
11:30 - 12:50 | |||
11:30 20mTalk | PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System Main Conference hyucksung kwon Hanyang University, Kyungmo Koo Hanyang University, Janghyeon Kim Hanyang University, Woongkyu Lee Hanyang University, Minjae Lee Hanyang University, Gyeonggeun Jung KAIST, Hyungdeok Lee Solution Advanced Technology, SK hynix, Yousub Jung Solution Advanced Technology, SK hynix, Jaehan Park Solution Advanced Technology, SK hynix, Yosub Song Solution Advanced Technology, SK hynix, Byeongsu Yang Solution Advanced Technology, SK hynix, Haerang Choi Solution Advanced Technology, SK hynix, Guhyun Kim Solution Advanced Technology, SK hynix, Jongsoon Won Solution Advanced Technology, SK hynix, Woojae Shin Solution Advanced Technology, SK hynix, Changhyun Kim Solution Advanced Technology, SK hynix, Shin Gyeongcheol Solution Advanced Technology, SK hynix, Yongkee Kwon Tenstorrent, Ilkon Kim Solution Advanced Technology, SK hynix, Euicheol Lim SK hynix, John Kim KAIST, Jungwook Choi Hanyang University | ||
11:50 20mTalk | Adaptive Draft Sequence Length: Enhancing Speculative Decoding Throughput on PIM-Enabled Systems Main Conference Runze Wang Huazhong University of Science and Technology, Qinggang Wang Huazhong University of Science and Technology, Haifeng Liu Huazhong University of Science and Technology, Long Zheng Huazhong University of Science and Technology, XIAOFEI LIAO Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology, Jingling Xue University of New South Wales | ||
12:10 20mTalk | Conduit: Programmer-Transparent Near-Data Processing Using Multiple Compute-Capable Resources in SSDs Main Conference Rakesh Nadig ETH Zurich, Vamanan Arulchelvan ETH Zurich, Mayank Kabra ETH Zurich, Harshita Gupta ETH Zurich, Rahul Bera ETH Zurich, Nika Mansouri Ghiasi ETH Zurich, Nanditha Rao ETH Zurich, Qingcai Jiang ETH Zurich, Andreas Kosmas Kakolyris ETH Zurich, Yu Liang ETH Zurich, Mohammad Sadrosadati ETH Zürich, Onur Mutlu ETH Zurich | ||
12:30 20mTalk | Inter-Die Interconnection Networks for Reducing Peak Current Overlaps in Next-Generation NAND Systems Main Conference | ||