TY - JOUR
T1 - NetMix
T2 - A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks
AU - Reyna, Matthew A.
AU - Chitra, Uthsav
AU - Elyanow, Rebecca
AU - Raphael, Benjamin J.
N1 - Funding Information:
M.A.R. was supported in part by the National Cancer Institute of the NIH (Cancer Target Discovery and Development Network grant U01CA217875). B.J.R. was supported by US National Institutes of Health (NIH) grants R01HG007069 and U24CA211000.
Publisher Copyright:
© Copyright 2021, Mary Ann Liebert, Inc., publishers 2021.
PY - 2021/5
Y1 - 2021/5
N2 - A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
AB - A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
KW - altered subnetworks pathways
KW - bias
KW - biological networks
KW - cancer
KW - differential gene expression
UR - http://www.scopus.com/inward/record.url?scp=85106534027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106534027&partnerID=8YFLogxK
U2 - 10.1089/cmb.2020.0435
DO - 10.1089/cmb.2020.0435
M3 - Article
C2 - 33400606
AN - SCOPUS:85106534027
SN - 1066-5277
VL - 28
SP - 469
EP - 484
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 5
ER -