Identifying properties and concentrations of components from an observed mixture, known as deconvolution, is a fundamental problem in signal processing. It has diverse applications in fields ranging from hyperspectral imaging to noise cancellation in audio recordings. This paper focuses on in-silico deconvolution of signals associated with complex tissues into their constitutive cell-type-specific components and a quantitative characterization of the cell types. Deconvolving mixed tissues/cell types is useful in the removal of contaminants (e.g., surrounding cells) from tumor biopsies, as well as in monitoring changes in the cell population in response to treatment or infection. In these contexts, the observed signal from the mixture of cell types is assumed to be a convolution, using a linear instantaneous (LI) mixing process, of the expression levels of genes in constitutive cell types. The goal is to use known signals corresponding to individual cell types and a model of the mixing process to cast the deconvolution problem as a suitable optimization problem. In this paper, we present a survey and in-depth analysis of models, methods, and assumptions underlying deconvolution techniques. We investigate the choice of the different loss functions for evaluating estimation error, constraints on solutions, preprocessing and data filtering, feature selection, and regularization to enhance the quality of solutions and the impact of these choices on the performance of commonly used regression-based methods for deconvolution. We assess different combinations of these factors and use detailed statistical measures to evaluate their effectiveness. Some of these combinations have been proposed in the literature, whereas others represent novel algorithmic choices for deconvolution. We identify shortcomings of current methods and avenues for further investigation. For many of the identified shortcomings, such as normalization issues and data filtering, we provide new solutions. We summarize our findings in a prescriptive step-by-step process, which can be applied to a wide range of deconvolution problems.
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
- feature selection
- gene expression
- linear regression
- loss function
- range filtering