TY - GEN
T1 - On the sample complexity of cancer pathways identification
AU - Vandin, Fabio
AU - Raphael, Benjamin J.
AU - Upfal, Eli
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis of mutations from large cancer sequencing studies. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets.
AB - In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis of mutations from large cancer sequencing studies. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets.
UR - http://www.scopus.com/inward/record.url?scp=84926371985&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84926371985&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-16706-0_33
DO - 10.1007/978-3-319-16706-0_33
M3 - Conference contribution
AN - SCOPUS:84926371985
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 326
EP - 337
BT - Research in Computational Molecular Biology - 19th Annual International Conference, RECOMB 2015, Proceedings
A2 - Przytycka, Teresa M.
PB - Springer Verlag
T2 - 19th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2015
Y2 - 12 April 2015 through 15 April 2015
ER -