Gene set bagging for estimating the probability a statistically significant result will replicate

Andrew E. Jaffe, John D. Storey, Hongkai Ji, Jeffrey T. Leek

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features associated with illness. We propose a new approach, called gene set bagging, for measuring the probability that a gene set replicates in future studies. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological categories replicate in the bagged samples.Results: Using both simulated and publicly-available genomics data, we demonstrate that significant categories in a gene set enrichment analysis may be unstable when subjected to resampling. We show our method estimates the replication probability (R), the probability that a gene set will replicate as a significant result in future studies, and show in simulations that this method reflects replication better than each set's p-value.Conclusions: Our results suggest that gene lists based on p-values are not necessarily stable, and therefore additional steps like gene set bagging may improve biological inference on gene sets.

Original languageEnglish (US)
Article number360
JournalBMC bioinformatics
Issue number1
StatePublished - Dec 12 2013

All Science Journal Classification (ASJC) codes

  • Applied Mathematics
  • Molecular Biology
  • Structural Biology
  • Biochemistry
  • Computer Science Applications


  • DNA methylation
  • Gene expression
  • Gene ontology
  • Gene set enrichment analysis


Dive into the research topics of 'Gene set bagging for estimating the probability a statistically significant result will replicate'. Together they form a unique fingerprint.

Cite this