TY - JOUR
T1 - Statistical analysis of big data on pharmacogenomics
AU - Fan, Jianqing
AU - Liu, Han
N1 - Funding Information:
We thank Rongling Wu for his helpful comments and discussions. Jianqing Fan is supported by NSF Grant DMS-1206464 and NIH Grants R01GM100474 and R01-GM072611 . Han Liu is supported by NSF Grant III-1116730 and a NIH sub-award from Johns Hopkins University.
PY - 2013/6/30
Y1 - 2013/6/30
N2 - This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed.
AB - This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed.
KW - Approximate factor model
KW - Big data
KW - Graphical model
KW - High dimensional statistics
KW - Marginal screening
KW - Multiple testing
KW - Robust statistics
KW - Variable selection
UR - http://www.scopus.com/inward/record.url?scp=84879554959&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84879554959&partnerID=8YFLogxK
U2 - 10.1016/j.addr.2013.04.008
DO - 10.1016/j.addr.2013.04.008
M3 - Review article
C2 - 23602905
AN - SCOPUS:84879554959
SN - 0169-409X
VL - 65
SP - 987
EP - 1000
JO - Advanced Drug Delivery Reviews
JF - Advanced Drug Delivery Reviews
IS - 7
ER -