Nonparametric methods for identifying differentially expressed genes in microarray data

Olga G. Troyanskaya, Mitchell E. Garber, Patrick O. Brown, David Botstein, Russ B. Altman

Research output: Contribution to journalArticle

230 Scopus citations

Abstract

Motivation: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1) nonparametric t-test, (2) Wilcoxon (or Mann-Whitney) rank sum test, and (3) a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs. Results: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.

Original languageEnglish (US)
Pages (from-to)1454-1461
Number of pages8
JournalBioinformatics
Volume18
Issue number11
DOIs
StatePublished - Nov 1 2002
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'Nonparametric methods for identifying differentially expressed genes in microarray data'. Together they form a unique fingerprint.

  • Cite this