ScAlign: A tool for alignment, integration, and rare cell identification from scRNA-seq data

Nelson Johansen, Gerald Quon

Research output: Contribution to journalArticle

4 Scopus citations


scRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping, or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign reveals gene expression programs for rare populations of malaria parasites. Our framework is widely applicable to integration challenges in other domains.

Original languageEnglish (US)
Article number166
JournalGenome Biology
Issue number1
StatePublished - Aug 14 2019



  • Alignment
  • Batch effects
  • Data harmonization
  • Data integration
  • Deep learning
  • Domain adaptation
  • Neural networks
  • Response to stimulus
  • scRNA-seq

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Cite this