Recent genome sequencing studies have shown that the somatic mutations that drive cancer development are distributed across a large number of genes. This mutational heterogeneity complicates efforts to distinguish functional mutations from sporadic, passenger mutations. Since cancer mutations are hypothesized to target a relatively small number of cellular signaling and regulatory pathways, a common practice is to assess whether known pathways are enriched for mutated genes. We introduce an alternative approach that examines mutated genes in the context of a genome-scale gene interaction network. We present a computationally efficient strategy for de novo identification of subnetworks in an interaction network that are mutated in a statistically significant number of patients. This framework includes two major components. First, we use a diffusion process on the interaction network to define a local neighborhood of "influence" for each mutated gene in the network. Second, we derive a two-stage multiple hypothesis test to bound the false discovery rate (FDR) associated with the identified subnetworks. We test these algorithms on a large human protein-protein interaction network using somatic mutation data from glioblastoma and lung adenocarcinoma samples. We successfully recover pathways that are known to be important in these cancers and also identify additional pathways that have been implicated in other cancers but not previously reported as mutated in these samples. We anticipate that our approach will find increasing use as cancer genome studies increase in size and scope.
All Science Journal Classification (ASJC) codes
- Modeling and Simulation
- Molecular Biology
- Computational Mathematics
- Computational Theory and Mathematics
- cancer genomes
- statistical significance