On the sample complexity of cancer pathways identification

Fabio Vandin, Benjamin J. Raphael, Eli Upfal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis of mutations from large cancer sequencing studies. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 19th Annual International Conference, RECOMB 2015, Proceedings
EditorsTeresa M. Przytycka
PublisherSpringer Verlag
Number of pages12
ISBN (Electronic)9783319167053
StatePublished - 2015
Externally publishedYes
Event19th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2015 - Warsaw, Poland
Duration: Apr 12 2015Apr 15 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other19th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2015

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'On the sample complexity of cancer pathways identification'. Together they form a unique fingerprint.

Cite this