TY - JOUR
T1 - Estimation of the false discovery proportion with unknown dependence
AU - Fan, Jianqing
AU - Han, Xu
N1 - Funding Information:
This research was partly supported by National Institutes of Health grants R01-GM072611-11 and R01GM100474-04 and National Science Foundation grant DMS-1206464. We thank Dr Weijie Gu for early assistance on this project. We also thank the Joint Editor, a past Editor, the Associate Editors and the referees for many constructive comments which significantly improve the presentation of the paper.
Publisher Copyright:
© 2016 Royal Statistical Society
PY - 2017/9
Y1 - 2017/9
N2 - Large-scale multiple testing with correlated test statistics arises frequently in much scientific research. Incorporating correlation information in approximating the false discovery proportion (FDP) has attracted increasing attention in recent years. When the covariance matrix of test statistics is known, Fan and his colleagues provided an accurate approximation of the FDP under arbitrary dependence structure and some sparsity assumption. However, the covariance matrix is often unknown in many applications and such dependence information must be estimated before approximating the FDP. The estimation accuracy can greatly affect the FDP approximation. In the current paper, we study theoretically the effect of unknown dependence on the testing procedure and establish a general framework such that the FDP can be well approximated. The effects of unknown dependence on approximating the FDP are in the following two major aspects: through estimating eigenvalues or eigenvectors and through estimating marginal variances. To address the challenges in these two aspects, we firstly develop general requirements on estimates of eigenvalues and eigenvectors for a good approximation of the FDP. We then give conditions on the structures of covariance matrices that satisfy such requirements. Such dependence structures include banded or sparse covariance matrices and (conditional) sparse precision matrices. Within this framework, we also consider a special example to illustrate our method where data are sampled from an approximate factor model, which encompasses most practical situations. We provide a good approximation of the FDP via exploiting this specific dependence structure. The results are further generalized to the situation where the multivariate normality assumption is relaxed. Our results are demonstrated by simulation studies and some real data applications.
AB - Large-scale multiple testing with correlated test statistics arises frequently in much scientific research. Incorporating correlation information in approximating the false discovery proportion (FDP) has attracted increasing attention in recent years. When the covariance matrix of test statistics is known, Fan and his colleagues provided an accurate approximation of the FDP under arbitrary dependence structure and some sparsity assumption. However, the covariance matrix is often unknown in many applications and such dependence information must be estimated before approximating the FDP. The estimation accuracy can greatly affect the FDP approximation. In the current paper, we study theoretically the effect of unknown dependence on the testing procedure and establish a general framework such that the FDP can be well approximated. The effects of unknown dependence on approximating the FDP are in the following two major aspects: through estimating eigenvalues or eigenvectors and through estimating marginal variances. To address the challenges in these two aspects, we firstly develop general requirements on estimates of eigenvalues and eigenvectors for a good approximation of the FDP. We then give conditions on the structures of covariance matrices that satisfy such requirements. Such dependence structures include banded or sparse covariance matrices and (conditional) sparse precision matrices. Within this framework, we also consider a special example to illustrate our method where data are sampled from an approximate factor model, which encompasses most practical situations. We provide a good approximation of the FDP via exploiting this specific dependence structure. The results are further generalized to the situation where the multivariate normality assumption is relaxed. Our results are demonstrated by simulation studies and some real data applications.
KW - Approximate factor model
KW - Dependent test statistics
KW - False discovery proportion
KW - Large-scale multiple testing
KW - Unknown covariance matrix
UR - http://www.scopus.com/inward/record.url?scp=84992453295&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84992453295&partnerID=8YFLogxK
U2 - 10.1111/rssb.12204
DO - 10.1111/rssb.12204
M3 - Article
C2 - 29056863
AN - SCOPUS:84992453295
SN - 1369-7412
VL - 79
SP - 1143
EP - 1164
JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology
JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology
IS - 4
ER -