CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM (HPCA 2026 - Main Conference)

Who

Shunchen Shi, Qijia Yang, Fan Yang, Yu Huang, Youwei Zhuo, Zhichun Li, Ninghui Sun, Xueqi Li

Track

HPCA 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 2 Feb 2026 16:10 - 16:30 at Collaroy - Processing-in-Memory Architectures Chair(s): Byeongho Kim

Abstract

The growing demand for high-bandwidth and large-capacity memory access in data-intensive workloads has driven the development and deployment of Processing-in-Memory (PIM) architectures. However, existing DIMM-based PIM systems suffer from the severe communication bottleneck between the processing elements (PEs) near the PIM banks due to their requirement on host CPU forwarding. This bottleneck limits the efficiency of collective operations and degrades scalability and performance for workloads that require inter-PE communication.

To address the communication limitation, we propose CoCoTree, a computation-capable architecture for collective communication in scalable DIMM-based PIM. CoCoTree supports direct and high-throughput inter-PE communication without host intervention. CoCoTree accelerates key collective communication using novel hierarchical binary tree topology and lightweight in-network computation support. We design and implement microarchitectures for the main building blocks: Co-Leaf and Co-Node, to efficiently handle the data packing, routing, and processing in CoCoTree. Furthermore, we also introduce a packet-based communication protocol tailored to the CoCoTree architecture, which decouples control and data through a two-phase configuration-computation communication mechanism to efficiently support a wide range of collective communication operations. CoCoTree effectively mitigates inter-PE communication bottlenecks, enabling scalable PIM systems capable of meeting the demands of growing data size. Experimental results show that CoCoTree achieves up to 95.6× improvement for collective operations and improves end-to-end application performance by up to 10.5× across various workloads over the baseline PIM, while outperforming state-of-the-art PIM communication architectures in both performance and scalability.

Shunchen Shi

Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences

Qijia Yang

Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences

Fan Yang

Institute of Computing Technology, Chinese Academy of Science

Yu Huang

Huazhong University of Science and Technology

Youwei Zhuo

Peking University

Zhichun Li

Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences

Ninghui Sun

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences

Xueqi Li

State Key Lab of Processors, Institute of Computing Technology, CAS

China

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 2 Feb
Displayed time zone: Hobart change

15:50 - 17:10	Processing-in-Memory ArchitecturesMain Conference at Collaroy Chair(s): Byeongho Kim Samsung Electronics

15:50 20m Talk		The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution Main Conference Minh S. Q. Truong Carnegie Mellon University, Yiqiu Sun University of Illinois Urbana-Champaign, Dawei Xiong University of Illinois Urbana-Champaign, Amol Shah University of Illinois Urbana-Champaign, Alex Glass Carnegie Mellon University, Abraham Farrell University of Illinois Urbana-Champaign, James A. Bain Carnegie Mellon University, L. Richard Carley Carnegie Mellon University, Saugata Ghose University of Illinois Urbana-Champaign Link to publication
16:10 20m Talk		CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM Main Conference Shunchen Shi Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Qijia Yang Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Fan Yang Institute of Computing Technology, Chinese Academy of Science, Yu Huang Huazhong University of Science and Technology, Youwei Zhuo Peking University, Zhichun Li Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Xueqi Li State Key Lab of Processors, Institute of Computing Technology, CAS
16:30 20m Talk		PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures Main Conference Dongjae Lee KAIST, Bongjoon Hyun Samsung, Youngjin Kwon KAIST, Minsoo Rhu KAIST
16:50 20m Talk		Count2Multiply: Reliable In-Memory High-Radix Counting Main Conference Joao Paulo Cardoso de Lima TU Dresden, ScaDS.AI, Benjamin F. Morris III Duke University, Asif Ali Khan TU Dresden, Germany, Jeronimo Castrillon TU Dresden, Germany, Alex Jones Syracuse University