scDblFinder - known doublets

Question

scDblFinder - known doublets

maryellenlynall opened this issue 2 years ago · comments

Hi, thanks for a great package. When I run scDblFinder on a single cell experiment object with arguments knowns= and knownsUse="discard", the output sce$scDblFinder.class calls some of the known doublets as singlets.
The help for scDblFinder seems to state that with option "discard", the known doublets, while not used for training, should still be called as doublets, so I'm not sure why this is happening. I can of course just add those known doublets back in as doublets manually, but wondered if there was an issue with the scDblFinder code here?

Pierre-Luc · Answer 1 · Tue Mar 21 2023 22:39:10 GMT+0800 (China Standard Time)

Hi, where in the documentation do you see this written?
Perhaps the passage needs to be clarified, but with the 'discard' mode there is no enforcement that scDblFinder will call the known doublets as doublets. The simplest scenario is if the known doublets are homotypic (i.e. formed by two cells of the same type), and hence transcriptionally indistinguishable from singlets.

Mary-Ellen Lynall · Answer 2 · Thu Mar 23 2023 03:37:22 GMT+0800 (China Standard Time)

Hi, it's in the help when I do ?scDblFinder:
"'discard' (they are discarded for the purpose of training, but counted as positive)"
To me this implies the are called as positive. No problem if that isn't the behaviour, but it would help to clarify that sentence.

Pierre-Luc · Answer 3 · Thu Mar 23 2023 15:25:48 GMT+0800 (China Standard Time)

Ok thanks, you're right that's indeed very misleading.
They were counted for the purpose of calculating the threshold, but not assigned as doublets unless also predicted to be.
I've now enforced that they are marked as doublets in scDblFinder.class, while leaving the scDblFinder.score untouched, so that one can still distinguish those that are predicted from those that wouldn't be. I think this will make sense for most users.
I'll be pushing to Bioc devel once the checks have passed, and until then you can install from github.

Mary-Ellen Lynall · Answer 4 · Thu Mar 23 2023 17:07:25 GMT+0800 (China Standard Time)

Thanks for clarifying. I think it might be helpful to change the help documentation rather than the function's behaviour, so that people's existing scripts don't start working differently.

Mary-Ellen Lynall · Answer 5 · Thu Mar 23 2023 20:19:26 GMT+0800 (China Standard Time)

I think that's particularly true given the threshold is printed to consult isn't given as a default output in the output object, so it's difficult to determine which known doublets are predicted by scDblFinder simply from the doublet score without the threshold being output. I think the behaviour you had before was helpful, it's just the help that needed clarifying

Pierre-Luc · Answer 6 · Thu Mar 23 2023 21:21:25 GMT+0800 (China Standard Time)

Changed back and updated doc