TY - JOUR
T1 - Analysis of population structure
T2 - A unifying framework and novel methods based on sparse factor analysis
AU - Engelhardt Martin, Barbara
AU - Stephens, Matthew
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2010/9
Y1 - 2010/9
N2 - We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more "continuous," as in isolation-by-distance models.
AB - We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more "continuous," as in isolation-by-distance models.
UR - http://www.scopus.com/inward/record.url?scp=78049415423&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78049415423&partnerID=8YFLogxK
U2 - 10.1371/journal.pgen.1001117
DO - 10.1371/journal.pgen.1001117
M3 - Article
C2 - 20862358
AN - SCOPUS:78049415423
VL - 6
JO - PLoS Genetics
JF - PLoS Genetics
SN - 1553-7390
IS - 9
M1 - e1001117
ER -