Segalign: A scalable gpu-based whole genome aligner

Sneha D. Goenka, Yatish Turakhia, Benedict Paten, Mark Horowitz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Pairwise Whole Genome Alignment (WGA) is a crucial first step to understanding evolution at the DNA sequence-level. Pairwise WGA of thousands of currently available species genomes could help make biological discoveries, however, computing them for even a fraction of the millions of possible pairs is prohibitive - WGA of a single pair of vertebrate genomes (human-mouse) takes 11 hours on a 96-core Amazon Web Services (AWS) instance (c5.24xlarge). This paper presents SegAlign - a scalable, GPU-accelerated system for computing pairwise WGA. SegAlign is based on the standard seed-filter-extend heuristic, in which the filtering stage dominates the runtime (e.g. 98% for human-mouse WGA), and is accelerated using GPU(s). Using three vertebrate genome pairs, we show that SegAlign provides a speedup of up to ;14 × on an 8-GPU, 64-core AWS instance (p3.16xlarge) for WGA and nearly ;2.3 × reduction in dollar cost. SegAlign also allows parallelization over multiple GPU nodes and scales efficiently.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2020
Subtitle of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781728199986
DOIs
StatePublished - Nov 2020
Externally publishedYes
Event2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 - Virtual, Atlanta, United States
Duration: Nov 9 2020Nov 19 2020

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2020-November
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period11/9/2011/19/20

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Keywords

  • Apache Spark
  • Comparative Genomics
  • Graphics Processing Unit (GPU)
  • Whole Genome Alignment

Fingerprint

Dive into the research topics of 'Segalign: A scalable gpu-based whole genome aligner'. Together they form a unique fingerprint.

Cite this