TY - GEN
T1 - Netmix
T2 - 24th Annual Conference on Research in Computational Molecular Biology, RECOMB 2020
AU - Reyna, Matthew A.
AU - Chitra, Uthsav
AU - Elyanow, Rebecca
AU - Raphael, Benjamin J.
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared to other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely-used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions which we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. We introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
AB - A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared to other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely-used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions which we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. We introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
KW - Cancer
KW - Gene expression
KW - Interaction networks
KW - Maximum likelihood estimation
KW - Network anomaly
UR - http://www.scopus.com/inward/record.url?scp=85084251064&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084251064&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-45257-5_11
DO - 10.1007/978-3-030-45257-5_11
M3 - Conference contribution
AN - SCOPUS:85084251064
SN - 9783030452568
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 169
EP - 185
BT - Research in Computational Molecular Biology - 24th Annual International Conference, RECOMB 2020, Proceedings
A2 - Schwartz, Russell
PB - Springer
Y2 - 10 May 2020 through 13 May 2020
ER -