This is a Python reimplementation of the deMULTIplex tool written by Chris McGinnis accompanying his paper:
McGinnis CS, Patterson DM, Winkler J, Conrad DN, Hein MY, Srivastava V, Hu JL, Murrow LM, Weissman JS, Werb Z, Chow ED, Gartner ZJ. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat Methods. 2019 Jul;16(7):619-626. doi: 10.1038/s41592-019-0433-8. Epub 2019 Jun 17. PMID: 31209384; PMCID: PMC6837808.
This reimplementation has two design goals:
- easy reading of demultiplexed data generated with CITE-Seq-count
- seemless integratation with the Scanpy ecosystem
python -m pip install git+https://github.com/wflynny/multiseq-py.git
Load in tag-counts from CITE-Seq-count
import multiseq
import scanpy as sc
tags = multiseq.load_citeseq_count_matrix("path/to/umi_count")
If you want to intersect the tag data with corresponding gene expression data, we've tried to make that easier
gex = sc.read_10x_h5("path/to/gex_library/filtered_feature_bc_matrix.h5")
# Make sure barcodes are compatible (adds '-1' to tag bcs)
multiseq.harmonize_barcodes(gex, tags, action="extend")
gex, tags = multiseq.intersect_gex_and_tags(gex, tags, truth="gex")
The original MULTI-seq algorithm handles demultiplexing either by specifying a quantile threshold or sweeping through the quantiles and finding the optimal threshold. We can do both of those here:
multiseq.classify_cells(tags, q=0.5, inplace=True)
Rather than return calls of 'singlet', 'doublet', 'negative', this reimplementation
stores a binary matrix in tag_adata.layers["tag_calls"]
where value
X[i,j]
represents if cell i
is positive for tag j
. This helps distinguish
multiplets from doublets and a few others things. To get a single vector of
singlet, doublet, negative calls, we can do:
multiseq.call_singlets(tags, key_added="global_call")
tags.obs["global_call"].head()
Lastly, here's the quantile sweep functionality:
optimal_q, optimal_calls = multiseq.classify_by_sweep(
tags,
return_calls=True,
return_proportions=False
)
tags.layers["tag_calls"] = optimal_calls
multiseq.call_singlets(tags, key_added="global_call")
If you want to plot the distribution of 'singlet', 'doublet', 'negative' calls
as a function of quantile, you can do so by passing the return_proportions
keyword above:
import seaborn as sns
optimal_q, optimal_calls, call_proportions = multiseq.classify_by_sweep(
tags,
return_calls=True,
return_proportions=True
)
sns.lineplot(
data=call_proportions,
x="q",
y="proportion",
hue="subset"
)
Using the MIT license. If there is an issue with licensing, please let me know! And see the licensing discussion here.
Any help making this code better is appeciated. Please open an issue or PR!