Bayesian nonparametric discovery of isoforms and individual specific quantification

Derek Aguiar, Li Fang Cheng, Bianca Dumitrascu, Fantine Mordelet, Athma A. Pai, Barbara Engelhardt Martin

Research output: Contribution to journalArticle

Abstract

Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop biisq, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. biisq does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. biisq shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.

Original languageEnglish (US)
Article number1681
JournalNature communications
Volume9
Issue number1
DOIs
StatePublished - Dec 1 2018

Fingerprint

Protein Isoforms
estimates
etiology
splicing
inference
genes
catalogs
coding
RNA
proteins
RNA Isoforms
Alternative Splicing
simulation
Genes
Tissue

All Science Journal Classification (ASJC) codes

  • Chemistry(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Physics and Astronomy(all)

Cite this

Aguiar, Derek ; Cheng, Li Fang ; Dumitrascu, Bianca ; Mordelet, Fantine ; Pai, Athma A. ; Engelhardt Martin, Barbara. / Bayesian nonparametric discovery of isoforms and individual specific quantification. In: Nature communications. 2018 ; Vol. 9, No. 1.
@article{f110b65f8b014c3d8c6a1c3cb1fb50a7,
title = "Bayesian nonparametric discovery of isoforms and individual specific quantification",
abstract = "Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop biisq, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. biisq does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. biisq shows the most gains for low abundance isoforms, with 36{\%} more isoforms correctly inferred at low coverage versus a multi-sample method and 170{\%} more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.",
author = "Derek Aguiar and Cheng, {Li Fang} and Bianca Dumitrascu and Fantine Mordelet and Pai, {Athma A.} and {Engelhardt Martin}, Barbara",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41467-018-03402-w",
language = "English (US)",
volume = "9",
journal = "Nature Communications",
issn = "2041-1723",
publisher = "Nature Publishing Group",
number = "1",

}

Bayesian nonparametric discovery of isoforms and individual specific quantification. / Aguiar, Derek; Cheng, Li Fang; Dumitrascu, Bianca; Mordelet, Fantine; Pai, Athma A.; Engelhardt Martin, Barbara.

In: Nature communications, Vol. 9, No. 1, 1681, 01.12.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Bayesian nonparametric discovery of isoforms and individual specific quantification

AU - Aguiar, Derek

AU - Cheng, Li Fang

AU - Dumitrascu, Bianca

AU - Mordelet, Fantine

AU - Pai, Athma A.

AU - Engelhardt Martin, Barbara

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop biisq, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. biisq does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. biisq shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.

AB - Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop biisq, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. biisq does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. biisq shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.

UR - http://www.scopus.com/inward/record.url?scp=85046272187&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046272187&partnerID=8YFLogxK

U2 - 10.1038/s41467-018-03402-w

DO - 10.1038/s41467-018-03402-w

M3 - Article

VL - 9

JO - Nature Communications

JF - Nature Communications

SN - 2041-1723

IS - 1

M1 - 1681

ER -