TY - JOUR
T1 - A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
AU - Bipolar Disorders Working Group of the Psychiatric Genomics Consortium
AU - Rangan, Aaditya V.
AU - McGrouther, Caroline C.
AU - Kelsoe, John
AU - Schork, Nicholas
AU - Stahl, Eli
AU - Zhu, Qian
AU - Krishnan, Arjun
AU - Yao, Vicky
AU - Troyanskaya, Olga G.
AU - Bilaloglu, Seda
AU - Raghavan, Preeti
AU - Bergen, Sarah
AU - Jureus, Anders
AU - Landen, Mikael
N1 - Publisher Copyright:
© 2018 Rangan et al. http://creativecommons.org/licenses/by/4.0/
PY - 2018/5
Y1 - 2018/5
N2 - A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).
AB - A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).
UR - http://www.scopus.com/inward/record.url?scp=85048172661&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048172661&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1006105
DO - 10.1371/journal.pcbi.1006105
M3 - Article
C2 - 29758032
AN - SCOPUS:85048172661
SN - 1553-734X
VL - 14
JO - PLoS computational biology
JF - PLoS computational biology
IS - 5
M1 - e1006105
ER -