Abstract
High-throughput RNA-sequencing (RNA-seq) technologies are powerful tools for understanding cellular state. Often, it is of interest to quantify and to summarize changes in cell state that occur between experimental or biological conditions. Differential expression is typically assessed using uni-variate tests to measure genewise shifts in expression. However, these methods largely ignore changes in transcriptional correlation. Furthermore, there is a need to identify the low-dimensional structure of the gene expression shift to identify collections of genes that change between conditions. Here, we propose contrastive latent variable models designed for count data to create a richer portrait of differential expression in sequencing data. These models disentangle the sources of transcriptional variation in different conditions in the context of an explicit model of variation at baseline. More-over, we develop a model-based hypothesis testing framework that can test for global and gene subset-specific changes in expression. We evaluate our model through extensive simulations and analyses with count-based gene expression data from perturbation and observational sequencing experiments. We find that our methods effectively summarize and quantify complex transcriptional changes in case-control experimental sequencing data.
Original language | English (US) |
---|---|
Pages (from-to) | 1268-1291 |
Number of pages | 24 |
Journal | Annals of Applied Statistics |
Volume | 16 |
Issue number | 3 |
DOIs | |
State | Published - Sep 2022 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Modeling and Simulation
- Statistics, Probability and Uncertainty
Keywords
- Latent variable models
- RNA sequencing
- case-control data
- contrastive models
- differential expression