slowkow / symphony

Efficient and precise single-cell reference atlas mapping with Symphony

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Symphony

Efficient and precise single-cell reference atlas mapping with Symphony

Preprint: https://www.biorxiv.org/content/10.1101/2020.11.18.389189v1

Installation

Install the current version of Symphony from GitHub with:

# install.packages("devtools")
devtools::install_github("immunogenomics/symphony")

Installation notes:

  • You may need to install the latest version of devtools (because of the recent GitHub change from “master” to “main” terminology, which can cause install_github to fail).
  • You may also need to install the lastest version of Harmony:
devtools::install_github("immunogenomics/harmony")

Usage/Demos

Quick start

Check out the quick start tutorial.

Reference building

Option 1: Starting from reference genes by cells matrix

This function performs all steps of the reference building pipeline including variable gene selection, scaling, PCA, Harmony, and Symphony compression.

library(symphony)

# Build reference
reference = buildReference(
    ref_exp,                 # reference genes by cells matrix
    ref_metadata,            # dataframe with cell metadata
    vars = c('donor'),       # variable(s) to integrate over
    K = 100,                 # number of Harmony clusters
    verbose = TRUE,          # display output?
    do_umap = TRUE,          # run UMAP and save UMAP model to file?
    do_normalize = FALSE,    # normalize the expression matrix?
    vargenes_method = 'vst', # 'vst' or 'mvp'
    topn = 2000,             # number of variable genes to use
    d = 20,                  # number of dimensions for PCA
    save_uwot_path = '/absolute/path/uwot_model_1' # filepath to save UMAP model
)

Option 2: Starting from existing Harmony object

This function compresses an existing Harmony object into a Symphony reference that enables query mapping. We recommend this option if you would like your code to be more modular and flexible.

library(harmony)

# Run Harmony to integrate the reference cells
ref_harmObj = HarmonyMatrix(
        data_mat = t(Z_pca_ref),   # starting embedding (e.g. PCA, CCA) of cells
        meta_data = ref_metadata,  # dataframe with cell metadata
        theta = c(2),              # cluster diversity enforcement
        vars_use = c('donor'),     # variable to integrate out
        nclust = 100,              # number of clusters in Harmony model
        max.iter.harmony = 10,
        return_object = TRUE,      # set to TRUE to return the full Harmony object
        do_pca = FALSE             # do not recompute PCs
)

# Build Symphony reference
reference = buildReferenceFromHarmonyObj(
        ref_harmObj,            # output object from HarmonyMatrix()
        ref_metadata,           # dataframe with cell metadata
        vargenes_means_sds,     # gene names, means, and std devs for scaling
        loadings,               # genes x PCs
        verbose = TRUE,         # display output?
        do_umap = TRUE,         # run UMAP and save UMAP model to file?
        save_uwot_path = '/absolute/path/uwot_model_1' # filepath to save UMAP model)
)

Note that vargenes_means_sds requires column names c('symbol', 'mean', 'stddev') (see tutorial example).

Query mapping

Once you have a prebuilt reference (e.g. loaded from a saved .rds R object), you can map new query cells onto it starting from query gene expression.

# Map query
query = mapQuery(query_exp, query_metadata, reference, do_normalize = FALSE)

query$Z contains the harmonized query feature embedding.

If your query itself has multiple sources of batch variation you would like to integrate over (e.g. technology, donors, species), you can specify them in the vars parameter.

# Map query
query = mapQuery(query_exp, query_metadata, vars = c('donor', 'technology') reference, do_normalize = FALSE)

Reproducing results from manuscript

Code to reproduce Symphony results from the Kang et al. manuscript will be made available on github.com/immunogenomics/referencemapping.

About

Efficient and precise single-cell reference atlas mapping with Symphony

License:Other


Languages

Language:R 50.8%Language:C++ 49.2%