TY - JOUR

T1 - To how many simultaneous hypothesis tests can normal, student's t or bootstrap calibration be applied?

AU - Fan, Jianqing

AU - Hall, Peter

AU - Yao, Qiwei

N1 - Funding Information:
Jianqing Fan is Frederick L. Moore’18 Professor of Finance, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, and Director, Center for Statistical Research, Academy of Mathematics and Systems Science, Beijing, China (E-mail: [email protected]). Peter Hall is Professor, Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia (E-mail: [email protected]). Qiwei Yao is Professor, Department of Statistics, London School of Economics, London WC2A 2AE, U.K., and Guanghua School of Management, Peking University, China (E-mail: [email protected]). Fan’s work was sponsored in part by National Science Foundation grants DMS-0354223 and DMS-0704337, National Institutes of Health grant R01-GM07261, and NSF grant 10628104 of China. Yao’s work was sponsored in part by EPSRC grant EP/C549058. The authors thank the joint editors, associate editor, and two anonymous referees for their helpful comments that lead to the improvement of the manuscript.

PY - 2007/12

Y1 - 2007/12

N2 - In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to calibrate the tests so that the overall level of the simultaneous test is accurate. Often the sampling distribution is quite different for each test, so there may not be an opportunity to combine data across samples. In this setting, how large can N be, as a function of n, before level accuracy becomes poor? Here we answer this question in cases where the statistic under test is of Student's t type. We show that if either the normal or Student t distribution is used for calibration, then the level of the simultaneous test is accurate provided that log N increases at a strictly slower rate than n1/3 as n diverges. On the other hand, if bootstrap methods are used for calibration, then we may choose log N almost as large as n1/2 and still achieve asymptotic-level accuracy. The implications of these results are explored both theoretically and numerically.

AB - In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to calibrate the tests so that the overall level of the simultaneous test is accurate. Often the sampling distribution is quite different for each test, so there may not be an opportunity to combine data across samples. In this setting, how large can N be, as a function of n, before level accuracy becomes poor? Here we answer this question in cases where the statistic under test is of Student's t type. We show that if either the normal or Student t distribution is used for calibration, then the level of the simultaneous test is accurate provided that log N increases at a strictly slower rate than n1/3 as n diverges. On the other hand, if bootstrap methods are used for calibration, then we may choose log N almost as large as n1/2 and still achieve asymptotic-level accuracy. The implications of these results are explored both theoretically and numerically.

KW - Bonferroni's inequality

KW - Edgeworth expansion

KW - Genetic data

KW - Large-deviation expansion

KW - Level accuracy

KW - Microarray data

KW - Quantile estimation

KW - Skewness

KW - Student's t statistic

UR - http://www.scopus.com/inward/record.url?scp=38349049320&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38349049320&partnerID=8YFLogxK

U2 - 10.1198/016214507000000969

DO - 10.1198/016214507000000969

M3 - Article

AN - SCOPUS:38349049320

SN - 0162-1459

VL - 102

SP - 1282

EP - 1288

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

IS - 480

ER -