TY - JOUR
T1 - Optimal marker gene selection for cell type discrimination in single cell analyses
AU - Dumitrascu, Bianca
AU - Villar, Soledad
AU - Mixon, Dustin G.
AU - Engelhardt, Barbara E.
N1 - Funding Information:
This work was performed while S.V. was with New York University, and B.D. was with Princeton University. S.V. was partially supported by NSF DMS 2044349, EOARD FA9550-18-1-7007, and the NSF-Simons Research Collaboration on the Mathematical and Scientific Foundations of Deep Learning (MoDL; NSF DMS 2031985). D.G.M. was partially supported by AFOSR FA9550-18-1-0107 and NSF DMS 1829955. B.E.E. was supported by a grant from the Helmsley Trust, a grant from the NIH Human Tumor Atlas Research Program, NIH NHLBI R01 HL133218, and NSF CAREER AWD1005627.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.
AB - Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.
UR - http://www.scopus.com/inward/record.url?scp=85101245995&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101245995&partnerID=8YFLogxK
U2 - 10.1038/s41467-021-21453-4
DO - 10.1038/s41467-021-21453-4
M3 - Article
C2 - 33608535
AN - SCOPUS:85101245995
SN - 2041-1723
VL - 12
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 1186
ER -