The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution (HPCA 2026 - Main Conference)

Who

Minh S. Q. Truong, Yiqiu Sun, Dawei Xiong, Amol Shah, Alex Glass, Abraham Farrell, James A. Bain, L. Richard Carley, Saugata Ghose

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 2 Feb 2026 15:50 - 16:10 at Collaroy - Processing-in-Memory Architectures Chair(s): Byeongho Kim

Abstract

The processing-using-memory (PUM; a.k.a. in-memory computing) paradigm aims to eliminate data movement energy and performance costs by using memory cell interactions to directly perform computation. Given PUM’s potential for large savings, prior works have proposed many different datapath microarchitectures to demonstrate how general-purpose PUM benefits a wide range of application kernels. Unfortunately, these efforts largely depend on microarchitecture-specific vector-like interfaces that (1) force many of an application’s operations to be offloaded to a CPU, (2) require significant programmer effort to scale up applications to an entire memory chip, and (3) make it impractical to develop badly-needed systems software and programming tools for PUM.

To address these three issues, we propose the memory processing unit (MPU), a microarchitecture-agnostic interface layer for general-purpose PUM with three components. First, we develop an MPU instruction set architecture (ISA) with instructions to facilitate application scaling and task coordination. Second, we propose an ensemble execution model that coordinates execution across millions of PUM vector function units and maps to most general-purpose PUM microarchitectures. Third, we design a comprehensive MPU control path that efficiently executes MPU ISA binaries across multiple ensembles, and can enable CPU-free execution of complex end-to-end applications with PUM. We demonstrate how the MPU maps to multiple previously-proposed PUM datapaths, and how it achieves average performance/energy improvements of 1.79x/3.23x for 21 data-intensive kernels over these prior works (67x/47x vs. a modern GPU), while also achieving performance and energy improvements for the complex end-to-end applications.

Link to Publication

https://ghose.cs.illinois.edu/papers/26hpca_mpu.pdf

Minh S. Q. Truong

Carnegie Mellon University

Yiqiu Sun

University of Illinois Urbana-Champaign

Dawei Xiong

University of Illinois Urbana-Champaign

Amol Shah

University of Illinois Urbana-Champaign

Alex Glass

Carnegie Mellon University

Abraham Farrell

University of Illinois Urbana-Champaign

James A. Bain

Carnegie Mellon University

L. Richard Carley

Carnegie Mellon University

Saugata Ghose

University of Illinois Urbana-Champaign

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 2 Feb
Displayed time zone: Hobart change

15:50 - 17:10	Processing-in-Memory ArchitecturesMain Conference at Collaroy Chair(s): Byeongho Kim Samsung Electronics

15:50 20m Talk		The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution Main Conference Minh S. Q. Truong Carnegie Mellon University, Yiqiu Sun University of Illinois Urbana-Champaign, Dawei Xiong University of Illinois Urbana-Champaign, Amol Shah University of Illinois Urbana-Champaign, Alex Glass Carnegie Mellon University, Abraham Farrell University of Illinois Urbana-Champaign, James A. Bain Carnegie Mellon University, L. Richard Carley Carnegie Mellon University, Saugata Ghose University of Illinois Urbana-Champaign Link to publication
16:10 20m Talk		CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM Main Conference Shunchen Shi Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Qijia Yang Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Fan Yang Institute of Computing Technology, Chinese Academy of Science, Yu Huang Huazhong University of Science and Technology, Youwei Zhuo Peking University, Zhichun Li Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Xueqi Li State Key Lab of Processors, Institute of Computing Technology, CAS
16:30 20m Talk		PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures Main Conference Dongjae Lee KAIST, Bongjoon Hyun Samsung, Youngjin Kwon KAIST, Minsoo Rhu KAIST
16:50 20m Talk		Count2Multiply: Reliable In-Memory High-Radix Counting Main Conference Joao Paulo Cardoso de Lima TU Dresden, ScaDS.AI, Benjamin F. Morris III Duke University, Asif Ali Khan TU Dresden, Germany, Jeronimo Castrillon TU Dresden, Germany, Alex Jones Syracuse University