Running scDblFinder before or after normalization?

Question

Running scDblFinder before or after normalization?

DHelix opened this issue a year ago · comments

Hi,

First of all, thanks for developing this fantastic tool!
I wonder if I should run scDblFinder before or after normalization. Specifically, I'm working on 10X scRNA-seq data using the Seurat package. What data preprocessing steps do you recommend before running scDblFinder? I'm planning to do:

Import 10X Cellranger outputs
Filter out barcodes with very little coverage (e.g. UMI counts < 500)
Run scDblFinder, and remove predicted doublets
Perform additional filtering based on QC (e.g., based on % MT reads)
Normalization (SCTransform)
Dimensionality reduction and visualization (PCA, UMAP)
Other downstream analyses

Does this make sense to you?
Thank you very much for your time and help in advance!

Pierre-Luc · Answer 1 · Wed Oct 11 2023 13:36:22 GMT+0800 (China Standard Time)

Yes, this would be the order in which we normally proceed. scDblFinder works on raw counts.
In the cluster-based approach, though, it will use normalized expression and PCA if they are available to determine the clusters (assuming that those are not inputted). But even then it won't affect anything else than the clusters.

Some people like to keep the doublets in until dimensionality reduction, in order to visualize them (and see if some clusters, for instance, generally have a high doublet rate and might be entirely removed, although that's seldom the case). In general this doesn't change the dimred much, since doublets occupy the same space of variation as real cells and have a frequency proportional to the abundance of the different celltypes.

DHelix · Answer 2 · Thu Oct 12 2023 00:12:53 GMT+0800 (China Standard Time)

Hi @plger, Thank you so much for your quick and very informative reply!! I'll keep your suggestions in mind when I run scDblFinder.