Detecting sources of transcriptional heterogeneity in large-scale RNA-seq data sets

Brian C. Searle, Rachel M. Gittelman, Ohad Manor, Joshua M. Akey

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Gene expression levels are dynamic molecular phenotypes that respond to biological, environmental, and technical perturbations. Here we use a novel replicate-classifier approach for discovering transcriptional signatures and apply it to the Genotype-Tissue Expression data set. We identified many factors contributing to expression heterogeneity, such as collection center and ischemia time, and our approach of scoring replicate classifiers allows us to statistically stratify these factors by effect strength. Strikingly, from transcriptional expression in blood alone we detect markers that help predict heart disease and stroke in some patients. Our results illustrate the challenges and opportunities of interpreting patterns of transcriptional variation in large-scale data sets.

Original languageEnglish (US)
Pages (from-to)1391-1396
Number of pages6
Issue number4
StatePublished - Dec 2016
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Medicine


  • GTEx Consortium
  • Gene expression normalization
  • Random Forest classification
  • Transcriptional heterogeneity


Dive into the research topics of 'Detecting sources of transcriptional heterogeneity in large-scale RNA-seq data sets'. Together they form a unique fingerprint.

Cite this