Multi-Modal and Cross-Platform Integration
Why integration matters
No single assay captures the full biology of a cell. Gene expression measured by scRNA-seq reflects current transcriptional activity but not chromatin accessibility, surface protein abundance, or spatial context. Multi-modal experiments capture two or more of these layers simultaneously, and integration analysis asks what additional biological insight is gained by looking at them together rather than separately.
Paired RNA and chromatin accessibility
The 10x Genomics Multiome kit captures gene expression and ATAC-seq from the same nucleus, enabling direct linkage of chromatin accessibility to gene expression at single-cell resolution. ArchR is the preferred analysis framework for single-cell ATAC data, offering scalable peak calling, chromatin state annotation, and integration with RNA data. Weighted nearest neighbor (WNN) analysis in Seurat v5 builds a joint embedding from paired modalities, weighting each modality by its informativeness in each cell.
SCENIC+ uses paired RNA/ATAC to validate gene regulatory network inference by confirming that predicted transcription factor binding sites are in open chromatin in the cells where a regulon is active. This is currently the most rigorous approach to single-cell GRN inference available without perturbation experiments.
CITE-seq and surface protein integration
CITE-seq measures surface protein abundance alongside gene expression using antibody-derived tags (ADTs). It is particularly useful in immune profiling, where canonical surface markers often provide cleaner cell type discrimination than transcriptomics alone, and in settings where protein-level information is needed to connect single-cell data to flow cytometry or CyTOF results.
MOFA+ (Multi-Omics Factor Analysis) provides a generative framework for integrating heterogeneous data modalities, identifying latent factors that explain co-variation across multiple assay types. It is model-agnostic with respect to input modalities and is useful for experiments integrating more than two data types.
Single-cell and spatial integration
Single-cell transcriptomics provides cellular resolution and whole-transcriptome coverage but lacks spatial context. Spatial transcriptomics provides location but at lower cellular resolution in sequencing-based platforms. Integrating the two leverages the complementary strengths of each: single-cell data provides a high-resolution reference for cell type annotation, while spatial data grounds those annotations in tissue architecture.
Cell2location is the current standard for deconvolving cell type compositions in Visium spots using a single-cell reference. It uses a hierarchical Bayesian model that accounts for technical factors including sequencing depth and the expected cell density per spot. For Xenium and other imaging-based platforms where individual cells are resolved, the integration is more direct, with single-cell data used for reference-based annotation and pathway analysis.
Nicheformer is a spatial-aware foundation model (2024) that incorporates neighborhood context into cell representation, capturing signaling microenvironments that are invisible to models treating each cell independently.