TraceRTL: Agile Performance Evaluation for Microarchitecture Exploration (HPCA 2026 - Main Conference)

Who

Zifei Zhang, Yinan Xu, Sa Wang, Dan Tang, Yungang Bao

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 14:50 - 15:10 at Coogee - LLM Systems and Microarchitecture Tools Chair(s): Josep Torellas

Abstract

While agile chip development methodologies have accelerated RTL design and simulation, performance evaluation remains constrained by three challenges: (1) inefficient feature prototyping caused by the tight coupling between functional correctness and performance evaluation, particularly for large-scale, error-prone microarchitectures; (2) limited workloads due to incomplete peripheral/software environments or unavailable source code; and (3) time-consuming warm-up phases in sampling-based simulation, required to mitigate cold-start effects.

To address these challenges, we propose TraceRTL, an agile, trace-driven performance evaluation methodology that decouples the functional and performance components of CPU RTL designs. It introduces three techniques: (1) a trace-driven performance exploration framework that bypasses full functional correctness while preserving performance accuracy; (2) a trace transformation technique, TraceBridge, that replays traces across different formats and instruction sets; and (3) a fast warm-up strategy, TraceDedup, that eliminates redundant traces and efficiently initializes microarchitectural states.

Using TraceRTL, we develop the first trace-driven RTL CPU derived from XiangShan, a high-performance out-of-order RISC-V processor. TraceRTL achieves performance accuracies of 99.87% and 99.86% on SPECint2017 and SPECfp2017, respectively. With TraceBridge, we evaluate x86-based Google workload traces on a RISC-V RTL CPU and reveal distinct memory-bound behavior. TraceDedup further accelerates warm-up phases in sampling-based simulations by 1.5× to 11.8×.

Zifei Zhang

SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

Yinan Xu

SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

Sa Wang

SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

Dan Tang

SKLP, Institute of Computing Technology, Chinese Academy of Sciences; Beijing Institute of Open Source Chip

Yungang Bao

State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

14:10 - 15:30	LLM Systems and Microarchitecture ToolsMain Conference at Coogee Chair(s): Josep Torellas

14:10 20m Talk		LILo: Harnessing the On-chip Accelerators in Intel CPUs for Compressed LLM Inference Acceleration Main Conference Hyungyo Kim UIUC, Qirong Xia UIUC, Jinghan Huang UIUC, Nachuan Wang UIUC, Jung Ho Ahn Seoul National University, Younjoo Lee Seoul National University, Wajdi K Feghali Intel, Ren Wang Intel Labs, Nam Sung Kim UIUC
14:30 20m Talk		ReThermal: Co-Design of Thermal-Aware Static and Dynamic Scheduling for LLM Training on Liquid-Cooled Wafer-Scale Chips Main Conference Chengran Li Tsinghua University, Huizheng Wang Tsinghua University, Jiaxin Liu Tsinghua University, Jingyao Liu Tsinghua University, Zhiheng Yue Tsinghua University, Xia Li Shanghai AI Lab, Shenfei Jiang Shanghai AI Lab, Jinyi Deng Tsinghua University, Yang Hu Tsinghua University, Shouyi Yin Tsinghua University
14:50 20m Talk		TraceRTL: Agile Performance Evaluation for Microarchitecture Exploration Main Conference Zifei Zhang SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yinan Xu SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Sa Wang SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Dan Tang SKLP, Institute of Computing Technology, Chinese Academy of Sciences; Beijing Institute of Open Source Chip, Yungang Bao State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
15:10 20m Talk		Nugget: Portable Program Snippets Main Conference Zhantong Qiu University of California, Davis, Mahyar Samani University of California, Davis, Jason Lowe-Power University of California, Davis & Google