Distinguishing the somatic mutations responsible for cancer (driver mutations) from random, passenger mutations is a key challenge in cancer genomics. Driver mutations generally target cellular signaling and regulatory pathways consisting of multiple genes. This heterogeneity complicates the identification of driver mutations by their recurrence across samples, as different combinations of mutations in driver pathways are observed in different samples. We introduce the Multi-Dendrix algorithm for the simultaneous identification of multiple driver pathways de novo in somatic mutation data from a cohort of cancer samples. The algorithm relies on two combinatorial properties of mutations in a driver pathway: high coverage and mutual exclusivity. We derive an integer linear program that finds set of mutations exhibiting these properties. We apply Multi-Dendrix to somatic mutations from glioblastoma, breast cancer, and lung cancer samples. Multi-Dendrix identifies sets of mutations in genes that overlap with known pathways - including Rb, p53, PI(3)K, and cell cycle pathways - and also novel sets of mutually exclusive mutations, including mutations in several transcription factors or other genes involved in transcriptional regulation. These sets are discovered directly from mutation data with no prior knowledge of pathways or gene interactions. We show that Multi-Dendrix outperforms other algorithms for identifying combinations of mutations and is also orders of magnitude faster on genome-scale data. Software available at: http://compbio.cs.brown.edu/software.
All Science Journal Classification (ASJC) codes
- Ecology, Evolution, Behavior and Systematics
- Cellular and Molecular Neuroscience
- Molecular Biology
- Computational Theory and Mathematics
- Modeling and Simulation