Athena: Synergizing Data Prefetching and Off-Chip Prediction via Online Reinforcement Learning
This program is tentative and subject to change.
Prefetching and off-chip prediction are two techniques proposed to hide long memory access latencies in high-performance processors. In this work, we demonstrate that: (1) an off-chip predictor (OCP) is a viable alternative to data prefetchers for improving performance, especially in bandwidth-constrained processors, since accurately identifying off-chip memory requests is often easier than predicting the full cacheline address of future memory requests; yet (2) naively combining an OCP with a prefetcher negates the OCP’s performance benefits; and (3) existing prefetcher control policies (both heuristic- and learning-based), either lack the ability to coordinate OCPs with prefetchers in \emph{any} cache level, or leave significant room for performance improvement behind.
Our goal is to design a single holistic framework that can autonomously learn to synergize off-chip prediction with multiple prefetching techniques throughout the cache hierarchy by taking multiple system-level metrics into account, delivering \emph{consistent} performance benefits, regardless of the underlying prefetcher-OCP combination, workload, and system configuration.
To this end, we propose a new technique, Athena, that models the coordination between prefetchers and OCP as a reinforcement learning (RL) problem. Athena observes system-level metrics (e.g., prefetcher/OCP accuracy, bandwidth usage), and uses them as \emph{state} information to select a coordination \emph{action} (i.e., enabling the prefetcher and/or OCP, and adjusting prefetcher aggressiveness). Athena makes a key observation that using the improvement in performance as the only reward signal, as done by prior works, can be unreliable, since it does not account for inherent workload changes. To mitigate this, Athena introduces a new holistic reward framework that disentangles the metrics correlated to its own actions (e.g., change in performance) from uncorrelated metrics (e.g., change in mispredicted branch instructions). This allows Athena to autonomously learn a coordination policy by isolating the true impact of its actions from inherent variations in the workload.
Our extensive evaluation using a diverse set of 100 memory-intensive workloads shows that Athena \emph{consistently} outperforms multiple prior state-of-the-art coordination policies (e.g., heuristic-driven HPAC, learning-based TLP and MAB) across a wide range of system configurations with various combinations of underlying prefetchers at various cache levels, OCPs, and main memory bandwidth, while incurring only modest storage overhead. We intend to open-source Athena to facilitate future research.
This program is tentative and subject to change.
Tue 3 FebDisplayed time zone: Hobart change
11:30 - 12:50 | |||
11:30 20mTalk | Athena: Synergizing Data Prefetching and Off-Chip Prediction via Online Reinforcement Learning Main Conference Zhenrong Lang ETH Zürich, Rahul Bera ETH Zurich, Caroline Hengartner ETH Zürich, Konstantinos Kanellopoulos ETH Zurich, Rakesh Kumar NTNU, Mohammad Sadrosadati ETH Zürich, Onur Mutlu ETH Zurich | ||
11:50 20mTalk | Streamlined On-Chip Temporal Prefetching Main Conference | ||
12:10 20mTalk | Intermittence-Aware Cache Compression Main Conference Gan Fang Purdue University, Jianping Zeng Arizona State University, Yuchen Zhou Purdue University, Changhee Jung Purdue University, USA | ||
12:30 20mTalk | SnakeMan: Applying Relation-centric Notation to Model and Optimize Data Swizzle in the Cache of Modern NPU Main Conference Hanyu Zhang Zhejiang University, Fangxu Guo Zhejiang University, Liqiang Lu Zhejiang University, Long Wang Huawei Technologies, Yunfei Du Huawei Technologies, Zhe Wang Huawei Technologies, Jinghan Zhang Huawei Technologies, Jie Zhang Peking University, Chenli Xue Zhejiang University, Chengpeng Wu Zhejiang University, Ziyi Zhang Zhejiang University, Eric Liang Peking University, Size Zheng ByteDance, Jianwei Yin Zhejiang University | ||