TY - JOUR
T1 - Reconstructing genome mixtures from partial adjacencies.
AU - Mahmoody, Ahmad
AU - Kahn, Crystal L.
AU - Raphael, Benjamin J.
N1 - Funding Information:
We thank the anonymous referees for helpful comments on an earlier version of this manuscript. This work was supported by a CAREER Award from the National Science Foundation (#1053753). In addition, BJR is supported by a Career Award from the Scientific Interface from the Burroughs Wellcome Fund and an Alfred P. Sloan Research Fellowship. This article has been published as part of BMC Bioinformatics Volume 13 Supplement 19, 2012: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/ 13/S19.
PY - 2012
Y1 - 2012
N2 - Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the k-minimum completion problem (k-MCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.
AB - Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the k-minimum completion problem (k-MCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.
UR - http://www.scopus.com/inward/record.url?scp=84878050713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878050713&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-13-S19-S9
DO - 10.1186/1471-2105-13-S19-S9
M3 - Article
C2 - 23282028
AN - SCOPUS:84878050713
SN - 0011-9164
VL - 13 Suppl 19
SP - S9
JO - Unknown Journal
JF - Unknown Journal
M1 - S9
ER -