Gene set bagging for estimating the probability a statistically significant result will replicate

Andrew E. Jaffe, John D. Storey, Hongkai Ji, Jeffrey T. Leek

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features associated with illness. We propose a new approach, called gene set bagging, for measuring the probability that a gene set replicates in future studies. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological categories replicate in the bagged samples.Results: Using both simulated and publicly-available genomics data, we demonstrate that significant categories in a gene set enrichment analysis may be unstable when subjected to resampling. We show our method estimates the replication probability (R), the probability that a gene set will replicate as a significant result in future studies, and show in simulations that this method reflects replication better than each set's p-value.Conclusions: Our results suggest that gene lists based on p-values are not necessarily stable, and therefore additional steps like gene set bagging may improve biological inference on gene sets.

Original languageEnglish (US)
Article number360
JournalBMC bioinformatics
Volume14
Issue number1
DOIs
StatePublished - Dec 12 2013

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Keywords

  • DNA methylation
  • Gene expression
  • Gene ontology
  • Gene set enrichment analysis

Fingerprint Dive into the research topics of 'Gene set bagging for estimating the probability a statistically significant result will replicate'. Together they form a unique fingerprint.

Cite this