TY - JOUR
T1 - To how many simultaneous hypothesis tests can normal, student's t or bootstrap calibration be applied?
AU - Fan, Jianqing
AU - Hall, Peter
AU - Yao, Qiwei
N1 - Funding Information:
Jianqing Fan is Frederick L. Moore’18 Professor of Finance, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, and Director, Center for Statistical Research, Academy of Mathematics and Systems Science, Beijing, China (E-mail: [email protected]). Peter Hall is Professor, Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia (E-mail: [email protected]). Qiwei Yao is Professor, Department of Statistics, London School of Economics, London WC2A 2AE, U.K., and Guanghua School of Management, Peking University, China (E-mail: [email protected]). Fan’s work was sponsored in part by National Science Foundation grants DMS-0354223 and DMS-0704337, National Institutes of Health grant R01-GM07261, and NSF grant 10628104 of China. Yao’s work was sponsored in part by EPSRC grant EP/C549058. The authors thank the joint editors, associate editor, and two anonymous referees for their helpful comments that lead to the improvement of the manuscript.
PY - 2007/12
Y1 - 2007/12
N2 - In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to calibrate the tests so that the overall level of the simultaneous test is accurate. Often the sampling distribution is quite different for each test, so there may not be an opportunity to combine data across samples. In this setting, how large can N be, as a function of n, before level accuracy becomes poor? Here we answer this question in cases where the statistic under test is of Student's t type. We show that if either the normal or Student t distribution is used for calibration, then the level of the simultaneous test is accurate provided that log N increases at a strictly slower rate than n1/3 as n diverges. On the other hand, if bootstrap methods are used for calibration, then we may choose log N almost as large as n1/2 and still achieve asymptotic-level accuracy. The implications of these results are explored both theoretically and numerically.
AB - In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to calibrate the tests so that the overall level of the simultaneous test is accurate. Often the sampling distribution is quite different for each test, so there may not be an opportunity to combine data across samples. In this setting, how large can N be, as a function of n, before level accuracy becomes poor? Here we answer this question in cases where the statistic under test is of Student's t type. We show that if either the normal or Student t distribution is used for calibration, then the level of the simultaneous test is accurate provided that log N increases at a strictly slower rate than n1/3 as n diverges. On the other hand, if bootstrap methods are used for calibration, then we may choose log N almost as large as n1/2 and still achieve asymptotic-level accuracy. The implications of these results are explored both theoretically and numerically.
KW - Bonferroni's inequality
KW - Edgeworth expansion
KW - Genetic data
KW - Large-deviation expansion
KW - Level accuracy
KW - Microarray data
KW - Quantile estimation
KW - Skewness
KW - Student's t statistic
UR - http://www.scopus.com/inward/record.url?scp=38349049320&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38349049320&partnerID=8YFLogxK
U2 - 10.1198/016214507000000969
DO - 10.1198/016214507000000969
M3 - Article
AN - SCOPUS:38349049320
SN - 0162-1459
VL - 102
SP - 1282
EP - 1288
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 480
ER -