TY - JOUR
T1 - Identification of polymorphic inversions from genotypes
AU - Cáceres, Alejandro
AU - Sindi, Suzanne S.
AU - Raphael, Benjamin J.
AU - Cáceres, Mario
AU - González, Juan R.
N1 - Funding Information:
We would like to thank Clive Hoggart for providing us with a version of invertFREGENE that outputs the classification of each individual chromosome, and the reviewers for their useful comments. This work has been supported by the Spanish Ministry of Science and Innovation (MTM2008-02457) to JRG, the statistical genetics network (MTM2010-09526-E) to JRG and AC. BJR is supported by the National Institutes of Health (R01 HG5690) and a Burroughs Wellcome Career Award at the Scientific Interface. MC is supported by the ERC under the European Union Seventh Research Framework Programme (FP7) with Starting Grant (243212-INVFEST).
PY - 2012/2/9
Y1 - 2012/2/9
N2 - Background: Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies.Results: We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data 1, utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS).Conclusions: We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model 2. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals 34. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion.
AB - Background: Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies.Results: We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data 1, utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS).Conclusions: We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model 2. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals 34. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion.
UR - http://www.scopus.com/inward/record.url?scp=84874379930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874379930&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-13-28
DO - 10.1186/1471-2105-13-28
M3 - Article
C2 - 22321652
AN - SCOPUS:84874379930
SN - 1471-2105
VL - 13
JO - BMC bioinformatics
JF - BMC bioinformatics
IS - 1
M1 - 28
ER -