TY - JOUR
T1 - Finding driver pathways in cancer
T2 - Models and algorithms
AU - Vandin, Fabio
AU - Upfal, Eli
AU - Raphael, Benjamin J.
N1 - Funding Information:
We thank the anonymous reviewers for helpful suggestions that improved the manuscript. A preliminary version of this work was presented at WABI 2011. This work is supported by NSF grants IIS-1016648 and CCF-1023160, the Alfred P. Sloan Foundation, and in part by the MIUR of Italy under project AlgoDEEP port. 2008TFBWL4. BJR is also supported by an NSF CAREER Award (CCF-1053753) and a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. The results published here are in whole or part based upon data generated by The Cancer Genome Atlas pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov/.
PY - 2012/9/6
Y1 - 2012/9/6
N2 - Background: Cancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations. Results: We investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail. Conclusions: Accurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes.
AB - Background: Cancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations. Results: We investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail. Conclusions: Accurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes.
KW - Background mutation rate
KW - Cancer
KW - Driver mutations
KW - Generative models
KW - Pathways
KW - Somatic Mutations
UR - http://www.scopus.com/inward/record.url?scp=84865711602&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865711602&partnerID=8YFLogxK
U2 - 10.1186/1748-7188-7-23
DO - 10.1186/1748-7188-7-23
M3 - Article
C2 - 22954134
AN - SCOPUS:84865711602
SN - 1748-7188
VL - 7
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 23
ER -