TY - JOUR
T1 - Nonparametric methods for identifying differentially expressed genes in microarray data
AU - Troyanskaya, Olga G.
AU - Garber, Mitchell E.
AU - Brown, Patrick O.
AU - Botstein, David
AU - Altman, Russ B.
N1 - Funding Information:
We would like to thank Max Diehn, Sandrine Dudoit, Mike Liang, Art Owen, Gavin Sherlock, Michael Whitfield, and Soumya Raychaudhari for thoughtful comments and discussions. This research was supported by grants to DB (CA77097) and POB (CA85129). O.G.T. is supported
Funding Information:
by a Howard Hughes Medical Institute Predoctoral Fellowship and a Stanford Graduate Fellowship. RBA is supported by NIH-GM61374, NIH-LM06244, NSF DBI-9600637, SUN Microsystems and a grant from the Burroughs-Wellcome Foundation.
PY - 2002/11/1
Y1 - 2002/11/1
N2 - Motivation: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1) nonparametric t-test, (2) Wilcoxon (or Mann-Whitney) rank sum test, and (3) a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs. Results: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.
AB - Motivation: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1) nonparametric t-test, (2) Wilcoxon (or Mann-Whitney) rank sum test, and (3) a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs. Results: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.
UR - http://www.scopus.com/inward/record.url?scp=0036856209&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036856209&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/18.11.1454
DO - 10.1093/bioinformatics/18.11.1454
M3 - Article
C2 - 12424116
AN - SCOPUS:0036856209
SN - 1367-4803
VL - 18
SP - 1454
EP - 1461
JO - Bioinformatics
JF - Bioinformatics
IS - 11
ER -