High doublet rate?
yesitsjess opened this issue · comments
I'm getting 33.4% of my UMIs predicted to be doublets (27.5% when clusters=F) and I read somewhere in the region of 10% is more usual. Any suggestions on what might've caused this? Or comments on if I'm doing something wrong, please?
# read 10x cellranger count output
sce <- read10xCounts(paste0(data_dir, samps_dir, "/outs/filtered_feature_bc_matrix"), samps_dir)
# log normalise, perform PCA and generate UMAP
sce <- scater::logNormCounts(sce)
sce <- scater::runPCA(sce)
sce <- scater::runUMAP(sce)
plotReducedDim(sce, "UMAP")
# get clusters to run doublet finding function using cluster information
sce$cluster <- fastcluster(sce)
# identify suspected doublets
sce <- scDblFinder(sce, clusters="cluster")
#sce <- scDblFinder(sce, clusters=F) # alternatively
table(sce$scDblFinder.class)
I've also tried quickly clustering myself (rather than using fastcluster) and still get 23.2% doublets called.
g <- scran::buildSNNGraph(sce)
cl <- igraph::cluster_fast_greedy(g)$membership
sce$cluster <- cl
My dataset is basically all the same cell type so I would expect a low number of clusters - will this effect things? Also I haven't done any additional QC here, just output from cellranger count
is being used (empty droplets filtered out). I was planning to import the doublet predictions from scDblFinder
as a QC step in my main pipeline because I'm using cellbender remove-background
and wasn't sure if this would render my counts incompatible with doublet detection.
scDblFinder v1.16.0
Hi,
what is samps_dir
, and ncol(sce)
?
Hi, what is
samps_dir
, andncol(sce)
?
samps_dir
is a vector containing the sample directory names (as output by cellranger count
run)
[1] "SITTA8" "SITTB8" "SITTC8" "SITTD7" "SITTD8" "SITTE7" "SITTE8" "SITTF7" "SITTF8" "SITTG7" "SITTG8" "SITTH8"
> ncol(sce)
[1] 75861
It's always a good idea to read the "Getting started" documentation:
https://plger.github.io/scDblFinder/articles/scDblFinder.html#multiple-samples
So run it sample by sample and not on the whole dataset. Thanks, I'll try it.