plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data

Home Page:https://plger.github.io/scDblFinder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running scDblFinder before or after normalization?

DHelix opened this issue · comments

commented

Hi,

First of all, thanks for developing this fantastic tool!
I wonder if I should run scDblFinder before or after normalization. Specifically, I'm working on 10X scRNA-seq data using the Seurat package. What data preprocessing steps do you recommend before running scDblFinder? I'm planning to do:

  1. Import 10X Cellranger outputs
  2. Filter out barcodes with very little coverage (e.g. UMI counts < 500)
  3. Run scDblFinder, and remove predicted doublets
  4. Perform additional filtering based on QC (e.g., based on % MT reads)
  5. Normalization (SCTransform)
  6. Dimensionality reduction and visualization (PCA, UMAP)
  7. Other downstream analyses

Does this make sense to you?
Thank you very much for your time and help in advance!

Yes, this would be the order in which we normally proceed. scDblFinder works on raw counts.
In the cluster-based approach, though, it will use normalized expression and PCA if they are available to determine the clusters (assuming that those are not inputted). But even then it won't affect anything else than the clusters.

Some people like to keep the doublets in until dimensionality reduction, in order to visualize them (and see if some clusters, for instance, generally have a high doublet rate and might be entirely removed, although that's seldom the case). In general this doesn't change the dimred much, since doublets occupy the same space of variation as real cells and have a frequency proportional to the abundance of the different celltypes.

commented

Hi @plger, Thank you so much for your quick and very informative reply!! I'll keep your suggestions in mind when I run scDblFinder.