Abstract
The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNAsequencing (scRNA-seq) data. In this paper we introduce a simple, computationally efficient and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large-scale experiment for the collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on the following tools: (i) a hierarchical Pitman–Yor prior that recapitulates biological assumptions regarding cellular differentiation, and (ii) a Thompson sampling multiarmed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed by using a sequential Monte Carlo approach which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and scRNA-seq data alike.
Original language | English (US) |
---|---|
Pages (from-to) | 2003-2019 |
Number of pages | 17 |
Journal | Annals of Applied Statistics |
Volume | 14 |
Issue number | 4 |
DOIs | |
State | Published - 2020 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Modeling and Simulation
- Statistics, Probability and Uncertainty
Keywords
- Cell type discovery
- Experimental sampling design
- Hierarchical Pitman–Yor model
- Multiarmed bandits
- ScRNA-seq
- Sequential Monte Carlo
- Thompson sampling