plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data

Home Page:https://plger.github.io/scDblFinder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scDblFinder - known doublets

maryellenlynall opened this issue · comments

Hi, thanks for a great package. When I run scDblFinder on a single cell experiment object with arguments knowns= and knownsUse="discard", the output sce$scDblFinder.class calls some of the known doublets as singlets.
The help for scDblFinder seems to state that with option "discard", the known doublets, while not used for training, should still be called as doublets, so I'm not sure why this is happening. I can of course just add those known doublets back in as doublets manually, but wondered if there was an issue with the scDblFinder code here?

Hi, where in the documentation do you see this written?
Perhaps the passage needs to be clarified, but with the 'discard' mode there is no enforcement that scDblFinder will call the known doublets as doublets. The simplest scenario is if the known doublets are homotypic (i.e. formed by two cells of the same type), and hence transcriptionally indistinguishable from singlets.

Hi, it's in the help when I do ?scDblFinder:
"'discard' (they are discarded for the purpose of training, but counted as positive)"
To me this implies the are called as positive. No problem if that isn't the behaviour, but it would help to clarify that sentence.

Ok thanks, you're right that's indeed very misleading.
They were counted for the purpose of calculating the threshold, but not assigned as doublets unless also predicted to be.
I've now enforced that they are marked as doublets in scDblFinder.class, while leaving the scDblFinder.score untouched, so that one can still distinguish those that are predicted from those that wouldn't be. I think this will make sense for most users.
I'll be pushing to Bioc devel once the checks have passed, and until then you can install from github.

Thanks for clarifying. I think it might be helpful to change the help documentation rather than the function's behaviour, so that people's existing scripts don't start working differently.

I think that's particularly true given the threshold is printed to consult isn't given as a default output in the output object, so it's difficult to determine which known doublets are predicted by scDblFinder simply from the doublet score without the threshold being output. I think the behaviour you had before was helpful, it's just the help that needed clarifying

Changed back and updated doc