Atlas of primary cell-type-specific sequence models of gene expression and variant effects

Ksenia Sokolova, Chandra L. Theesfeld, Aaron K. Wong, Zijun Zhang, Kara Dolinski, Olga G. Troyanskaya

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. Genetic variation in the enormous noncoding space is linked to the majority of disease risk. To address the problem of linking these variants to expression changes in primary human cells, we introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expression directly from sequence. We provide models for 105 primary human cell types covering 7 organ systems, demonstrate their accuracy, and then apply them to prioritize relevant cell types for complex human diseases. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We demonstrate the accuracy of our approach through systematic evaluations and apply the models to prioritize ClinVar clinical variants of uncertain significance, verifying our top predictions experimentally.

Original languageEnglish (US)
Article number100580
JournalCell Reports Methods
Issue number9
StatePublished - Sep 25 2023

All Science Journal Classification (ASJC) codes

  • Genetics
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Biochemistry
  • Radiology Nuclear Medicine and imaging
  • Biotechnology
  • Computer Science Applications


  • CP: Systems biology
  • deep learning
  • functional genomics
  • gene expression prediction
  • human disease
  • variant effects


Dive into the research topics of 'Atlas of primary cell-type-specific sequence models of gene expression and variant effects'. Together they form a unique fingerprint.

Cite this