HPCA 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026

This program is tentative and subject to change.

Tue 3 Feb 2026 10:50 - 11:10 at Collaroy - CPU Microarchitecture Optimization

Hardware prefetching is a well-established technique for bridging the processor-memory performance gap. To improve cache miss coverage, modern processors often integrate multiple prefetchers. However, multi-prefetcher systems without proper management often suffer from suboptimal performance due to a surge of useless prefetches. Several techniques have been proposed to select appropriate prefetchers for issuing requests, but they all face limitations. Specifically, existing (1) static schemes lack feedback regulation mechanisms and suffer from inflexible prefetcher selections; (2) RL-based schemes incur high overhead and suffer from adjustment lag; and (3) historical performance-based schemes rely on inefficient runtime metrics that fail to accurately and clearly reflect a prefetcher’s true impact on performance.

In this paper, we propose I-POP, a high-performance and low-overhead prefetcher management scheme for multi-prefetcher systems. I-POP introduces a novel runtime metric, Prefetch Effectiveness (PE), which aggregates each prefetch request’s beneficial and harmful effects to precisely quantify the impact of a prefetcher on IPC performance, effectively overcoming the limitations of prior metrics. To compute and leverage this metric, I-POP incorporates two key components: the Metric Collector, which periodically calculates each prefetcher’s PE, and the Control Engine, which dynamically manages all prefetchers based on their PE values. Specifically, I-POP ignites (enables) prefetchers with positive PE values, adaptively tuning their aggressiveness, and disables those with non-positive PE. We evaluated I-POP on numerous workloads, and the results show I-POP outperforms two state-of-the-art approaches, Bandit and Alecto, by 3.8% and 3.7% across four benchmark suites in a single-core system, and 9.3% and 6.4% in a 16-core system, while incurring only 1.47 KB of storage overhead.

This program is tentative and subject to change.

Tue 3 Feb

Displayed time zone: Hobart change

09:50 - 11:10
CPU Microarchitecture OptimizationMain Conference at Collaroy
09:50
20m
Talk
The Last-Level Branch Predictor Revisited
Main Conference
David Schall Technical University of Munich, Mária Ďuračková University Of Edinburgh, Boris Grot University of Edinburgh, UK
10:10
20m
Talk
Tempranillo: Non-Speculative Early Register Release
Main Conference
Carlos Escuin Computing Systems Lab, Huawei Technologies Switzerland AG, Paolo Salvatore Galfano Computing Systems Laboratory, Zurich Research Center, Huawei Technologies, Switzerland, Davide Basilio Bartolini Computing Systems Laboratory, Zurich Research Center, Huawei Technologies, Switzerland, Leeor Peled Boole Labs, Tel-Aviv Research Center, Huawei Technologies, Israel, Mehdi Alipour Computing Systems Laboratory, Zurich Research Center, Huawei Technologies, Switzerland
10:30
20m
Talk
SMTcheck: Accurate SMT Interference Prediction to Improve Scheduling Efficiency in Datacenters
Main Conference
Sanghyun Kim Sungkyunkwan University, Jinhyeok Oh Sungkyunkwan University, Taehun Kim Sungkyunkwan University, Gyutae Kim Sungkyunkwan University, Youngsok Kim Yonsei University, Jaehyun Hwang Sungkyunkwan University, Joonsung Kim Sungkyunkwan University
10:50
20m
Talk
I-POP: Ignite Positive Prefetchers
Main Conference
Yiquan Lin Zhejiang University and Alibaba Group, Wenhai Lin Alibaba Group, Yiquan Chen Alibaba Group, Jiexiong Xu Zhejiang University and Alibaba Group, Shishun Cai Alibaba Group, Jiarong Ye Zhejiang University, Zonghui Wang Zhejiang University, Wenzhi Chen Zhejiang University