TY - JOUR
T1 - Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure
AU - Mimno, David
AU - Blei, David M.
AU - Engelhardt Martin, Barbara
N1 - Publisher Copyright:
© 2015, National Academy of Sciences. All rights reserved.
PY - 2015/6/30
Y1 - 2015/6/30
N2 - Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.
AB - Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.
KW - Admixture models
KW - Genomic data
KW - Model checking
KW - Population structure
KW - Posterior predictive checks
UR - http://www.scopus.com/inward/record.url?scp=84937844125&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937844125&partnerID=8YFLogxK
U2 - 10.1073/pnas.1412301112
DO - 10.1073/pnas.1412301112
M3 - Article
C2 - 26071445
AN - SCOPUS:84937844125
SN - 0027-8424
VL - 112
SP - E3441-E3450
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 26
ER -