Integrated analysis of multimodal single-cell data (version 2)

Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, Aaron J. Wilk, Charlotte Darby, Michael Zager, Paul Hoffman, Marlon Stoeckius, Efthymia Papalexi, Eleni P. Mimitou, Jaison Jain, Avi Srivastava, Tim Stuart, Lamar M. Fleming, Bertrand Yeung, Angela J. Rogers, Juliana M. McElrath, Catherine A. Blish, Raphael Gottardo, Peter Smibert, Rahul Satija

Abstract

The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce “weighted-nearest neighbor” analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.

Datasets

1. Human - Lung v2 (HLCA)
Metadata
assay_ontology_term_id
cell_type_ontology_term_id
development_stage_ontology_term_id
disease_ontology_term_id
self_reported_ethnicity_ontology_term_id
tissue_ontology_term_id
organism_ontology_term_id
sex_ontology_term_id
study
smoking_status
condition
subject_type
sample_type
3'_or_5'
sequencing_platform
cell_ranger_version
fresh_or_frozen
dataset
anatomical_region_level_2
anatomical_region_level_3
age
ann_finest_level
ann_level_1
ann_level_2
ann_level_3
ann_level_4
ann_level_5
ann_coarse_for_GWAS_and_modeling
suspension_type
tissue_type
cell_type
assay
disease
organism
sex
tissue
self_reported_ethnicity
development_stage
EFO:0009899328218 cells
EFO:0011025130357 cells
EFO:000992286308 cells
EFO:000990037081 cells
EFO:00099012920 cells
Preview
Integrated analysis of multimodal single-cell data (version 2)

Analyze this study

Source data

https://cellxgene.cziscience.com/collections/2f75d249-1bec-459b-bf2b-b86221097ced

Alias names

PMID34062119, PMC8238499

Cite this study

Hao, Y., Hao, S., Andersen-Nissen, E., Mauck, W.M., Zheng, S., Butler, A., Lee, M.J., Wilk, A.J., Darby, C., Zager, M. and Hoffman, P., 2021. Integrated analysis of multimodal single-cell data. Cell, 184(13), pp.3573-3587. https://doi.org/10.1016/j.cell.2021.04.048