At present there is tremendous interest in characterizing the magnitude and distribution of linkage disequilibrium (LD) throughout the human genome, which will provide the necessary foundation for genome-wide LD analyses and facilitate detailed evolutionary studies. To this end, a human high-density single-nucleotide polymorphism (SNP) marker map has been constructed. Many of the SNPs on this map, however, were identified by sampling a small number of chromosomes from a single population, and inferences drawn from studies using such SNPs may be influenced by ascertainment bias (AB). Through extensive simulations, we have found that AB is a potentially significant problem in estimating and comparing LD within and between populations. Specifically, the magnitude of AB is a function of the SNP discovery strategy, number of chromosomes used for SNP discovery, population genetic characteristics of the particular genomic region considered, amount of gene flow between populations, and demographic history of the populations. We demonstrate that a balanced SNP discovery strategy (where equal numbers of chromosomes are sampled from multiple subpopulations) is the optimal study design for generating broadly applicable SNP resources. Finally, we validate our theoretical predictions by comparing our results to publicly available data from ten genes sequenced in 24 African American and 23 European American individuals.
All Science Journal Classification (ASJC) codes
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology
- Ascertainment bias
- Linkage disequilibrium