RoMe: Row Granularity Access Memory System for Large Language Models (HPCA 2026 - Main Conference)

Sat 31 January - Wed 4 February 2026 Sydney, Australia

co-located with HPCA/CGO/PPoPP/CC 2026

Who

Hwayong Nam, Seungmin Baek, Jumin Kim, Michael Jaemin Kim, Jung Ho Ahn

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 14:30 - 14:50 at Collaroy - Memory Systems for Scalable Computing Chair(s): Alexandros Daglis

Abstract

Modern HBM-based memory systems have evolved over successive generations while retaining cache line granularity accesses. Preserving this fine granularity has required the introduction of bank groups and pseudo channels, which expand timing parameters and control overhead, thereby increasing the scheduling complexity of the memory controller. Large Language Models (LLMs) now dominate machine learning workloads and stream contiguous data blocks ranging from several kilobytes to several megabytes per operation. In a conventional HBM-based memory system, it must be divided into hundreds of 32B cache line transfers, forcing the MC to employ unnecessarily intricate scheduling and leading to growing inefficiency. To address this problem, we propose RoMe. RoMe accesses DRAM at row granularity and removes columns, bank groups, and pseudo channels from the memory interface. This design reduces the number of bank states and timing parameters and dramatically shrinks the depth of request queue otherwise required for reordering. Despite these simplifications, RoMe maintains the performance of representative LLM workloads, demonstrating that it can streamline an HBM-based memory system without performance loss and offering a practical alternative for future generations.

Link to Preprint

https://arxiv.org/abs/2512.01541

Hwayong Nam

Seoul National University

Seungmin Baek

Seoul National University

Jumin Kim

Seoul National University

Michael Jaemin Kim

Jung Ho Ahn

Seoul National University

South Korea

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

14:10 - 15:30	Memory Systems for Scalable ComputingMain Conference at Collaroy Chair(s): Alexandros Daglis Georgia Tech

14:10 20m Talk		BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism Main Conference Suhas Vittal Georgia Tech, Moinuddin K. Qureshi Georgia Tech
14:30 20m Talk		RoMe: Row Granularity Access Memory System for Large Language Models Main Conference Hwayong Nam Seoul National University, Seungmin Baek Seoul National University, Jumin Kim Seoul National University, Michael Jaemin Kim Meta, Jung Ho Ahn Seoul National University Pre-print
14:50 20m Talk		HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs Main Conference daoxuan xu William & Mary, Ying Li William & Mary, Yuwei Sun UIUC, Jie Ren William & Mary, Yifan Sun William&Mary
15:10 20m Talk		Pulse: Fine-Grained Hierarchical Hashing Index for Disaggregated Memory Main Conference Guangyang Deng Xiamen University, Zixiang Yu Xiamen University, Zhirong Shen Xiamen University, Qiangsheng Su Xiamen University, Jiwu Shu Xiamen University