TY - JOUR
T1 - Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes
AU - Kahn, Crystal L.
AU - Mozes, Shay
AU - Raphael, Benjamin J.
N1 - Funding Information:
SM was supported by NSF Grant CCF-0635089. BJR is supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and by funding from the ADVANCE Program at Brown University, under NSF Grant No. 0548311.
PY - 2010/1/4
Y1 - 2010/1/4
N2 - Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences.Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG).Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.
AB - Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences.Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG).Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.
UR - http://www.scopus.com/inward/record.url?scp=76749147895&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=76749147895&partnerID=8YFLogxK
U2 - 10.1186/1748-7188-5-11
DO - 10.1186/1748-7188-5-11
M3 - Article
C2 - 20047668
AN - SCOPUS:76749147895
SN - 1748-7188
VL - 5
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 11
ER -