HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Wed 4 Feb 2026 10:50 - 11:10 at Coogee - Graph Neural Networks and Retrieval Systems Chair(s): Amir Yazdanbakhsh

Retrieval-Augmented Generation (RAG) systems combine vector similarity search with large language models (LLMs) to deliver accurate, context-aware responses. However, co-locating the vector retriever and the LLM on shared GPU infrastructure introduces significant challenges: vector search is memory- and I/O-intensive, while LLM inference demands high throughput and low latency. Naive resource sharing often leads to severe performance degradation, particularly under high request load or large index sizes.

We present VectorLiteRAG, a deployment-friendly RAG system that achieves latency-compliant inference without requiring additional hardware resources. VectorLiteRAG introduces a fine-grained GPU resource allocation mechanism based on detailed performance modeling and access pattern analysis. By estimating search latency and query hit rate distributions, it identifies an optimal index partitioning point across CPU and GPU tiers to minimize contention and maximize throughput.

Our evaluations show that VectorLiteRAG consistently expands the SLO-compliant request rate range across all tested configurations, including both small and large LLMs, and small and large vector databases compared to naive baselines and state-of-the-art alternatives. In the best case, VectorLiteRAG improves the attainable SLO throughput by up to 1.5× without compromising generation quality or requiring additional compute resources.

Wed 4 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Graph Neural Networks and Retrieval SystemsMain Conference at Coogee
Chair(s): Amir Yazdanbakhsh Google Research, Brain Team
09:50
20m
Talk
VeloxGNN: Accelerating Out-of-Core based GNN Training with Low Data Migration and High Accuracy via Delayed Gradient Propagation
Main Conference
Yi Li University of Texas at Dallas, Tsun-Yu Yang Center for Computational Evolutionary Intelligence, Electrical & Computer Engineering, Duke University, Zhaoyan Shen Shandong University, Ming-Chang Yang The Chinese University of Hong Kong (CUHK), Bingzhe Li University of Texas at Dallas
10:10
20m
Talk
AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance
Main Conference
Seungkwan Kang KAIST, Seungjun Lee KAIST, Donghyun Gouk Panmnesia, Miryeong Kwon Panmnesia, Hyunkyu Choi Panmnesia, Junhyeok Jang Panmnesia, Sangwon Lee Panmnesia, Huiwon Choi KAIST, Jie Zhang Peking University, Wonil Choi Hanyang University, Mahmut Taylan Kandemir Pennsylvania State University, Myoungsoo Jung KAIST
10:30
20m
Talk
Scaling Graph Neural Network Training via Geometric Optimization
Main Conference
Fangzhou Ye University of Central Florida, Lingxiang Yin University of Central Florida, Hao Zheng University of Central Florida
10:50
20m
Talk
VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG
Main Conference
Junkyum Kim Georgia Institute of Technology, Divya Mahajan Georgia Institute of Technology