SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis
This program is tentative and subject to change.
Genome sequence analysis, which analyzes the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields. Given its importance and the exponentially growing volumes of genomic sequence data, there are extensive efforts to accelerate genome sequence analysis. In this work, we demonstrate a major bottleneck that greatly limits and diminishes the benefits of state-of-the-art genome sequence analysis accelerators: the data preparation bottleneck, where genomic sequence data is stored in compressed form and needs to be decompressed and formatted first before an accelerator can operate on it. To mitigate this bottleneck, we propose SAGe, an algorithm-architecture co-design for highly-compressed storage and high-performance access of large-scale genomic sequence data. The key challenge is to improve data preparation performance while maintaining high compression ratios (comparable to genomic-specific compression algorithms) at low hardware cost. We address this challenge by leveraging key properties of genomic datasets to co-design (i) a new (de)compression algorithm, (ii) hardware that decompresses data with lightweight operations and efficient streaming accesses, (iii) storage data layout, and (iv) interface commands to access data. SAGe is highly versatile as it supports datasets from different sequencing technologies and species. Thanks to its lightweight design, SAGe can be seamlessly integrated with a broad range of genome sequence analysis hardware accelerators to mitigate their data preparation bottlenecks. Our results demonstrate that SAGe improves the average end-to-end performance and energy efficiency of two state-of-the-art genome sequence analysis accelerators by 3.0×– 32.1× and 13.0×–34.0×, respectively, compared to when the accelerators rely on state-of-the-art decompression tools.
This program is tentative and subject to change.
Tue 3 FebDisplayed time zone: Hobart change
17:15 - 18:15 | |||
17:15 20mTalk | GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping Main Conference Julien Eudine Huawei Technologies Switzerland AG, Chu Li Huawei Zurich Research Center, Zhuo Cheng Huawei Zurich Research Center, Renzo Andri Huawei Technologies Switzerland AG, Onur Mutlu ETH Zurich, Can Firtina ETH Zurich and UMD, Mohammad Sadrosadati ETH Zürich, Nika Mansouri Ghiasi ETH Zurich, Konstantina Koliogeorgi ETH Zurich, Anirban Nag Huawei Zurich Research Center, Arash Tavakkol Huawei Zurich Research Center, Haiyu Mao King's College London, Shai Bergman Huawei Zurich Research Center, Ji Zhang Huawei Zurich Research Center | ||
17:35 20mTalk | SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis Main Conference Nika Mansouri Ghiasi ETH Zurich, Talu Güloglu ETH Zurich, Harun Mustafa ETH Zurich, Can Firtina ETH Zurich and UMD, Konstantina Koliogeorgi ETH Zurich, Konstantinos Kanellopoulos ETH Zurich, Haiyu Mao King's College London, Rakesh Nadig ETH Zurich, Mohammad Sadrosadati ETH Zürich, Jisung Park POSTECH (Pohang University of Science and Technology), Onur Mutlu ETH Zurich | ||
17:55 20mTalk | NP-CAM: Efficient and Scalable DNA Classification using a NoC-Partitioned CAM Architecture Main Conference Benjamin F. Morris III Duke University, Tergel Molom-Ochir Duke University, Changchun Zhou Duke University, Yiran Chen Duke University, Alex Jones Syracuse University, Hai "Helen" Li Duke University | ||