EDePasquale / DoubletDecon

A tool for removing doublets from single-cell RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too many doublets being called

piyushjo15 opened this issue · comments

Hi,
I am using the DoubletDecon on a Seurat v3 data following the improved seurat preprocessing steps. I have to use rho=0.7 as higher values flags an error. However, I am getting way too many doublets. Out of the 9660 cells, the algorithm is suggesting 5765 are doublets.
I also processed another data with 5431 cells and there also it is calling 36% cells are doublets. For that dataset rho=0.6 works.
Could you help me understand whats going on?
Thanks,
Piyush

Hi Piyush,

Thank you for your interest in DoubletDecon! The first think I would suggest is to make sure that you are using the correct rhop value for your dataset. I know you said that anything above 0.7 causes an error and that you have tried 0.7 and 0.6, but have you evaluated which clusters these are merging (or not merging) and does it make sense biologically? You may be overcalling doublets because you are not merging clusters enough. Second, I would suggest setting only50=T, as this will only pull out the highest confidence doublet predictions in DoubletDecon (though some true doublets may be left in the dataset with this setting). Third, I would consider running DoubletDecon multiple times (perhaps 20) and only taking the intersection of those results. Alternatively, we suggest that people run 2 doublet detection methods and use the intersection so you can have a smaller number of higher confidence doublet predictions.

Let me know if you have any more questions!

Best,
Erica

Hi Erica!

Thanks for quick response. Biologically, I would say that the samples are early progenitors and don't have too much differentiated tissue apart from early specification factors. Should I try calling less clusters in the Seurat analysis itself? I see that decreasing rho merges more clusters (assuming that is what the yellow block shows?), so I will also try to lower rho.
I also ran DoubletFinder, but it just gives as many doublets as I ask it to give. When I adjusted the parameter for expected doublet rate to 10% it only called 10% doublets and when I made it 40% it didn't saturate to a lower number but called 40% cell as doublets.
I was hoping that I can estimate doublet rate using DoubletDecon and than use that for DoubletFinder. I will post my results if decreasing number of cluster called or merging more clusters reduces the total number of doublets called.
Thanks

I generally think it is a better idea to correct granularity issues when creating clusters than try to fix them after the fact with cluster merging. The main reason is that the cluster identifications generated with Seurat determine the marker genes called for each of those clusters; the better genes we can get from an early stage, the better all downstream analyses will be. You are correct, DoubletFinder does require you to give it an idea of how many doublets are in the input sample and does use that number to create their predictions. Can you get a good estimate for how many doublets should be in your data based on loading? We have found that the DoubletFinder team is really good about helping users if you want to ask them how to deal with cases where you don't have an estimate on the doublet percentage. Either way, if you have any more questions, please reach out to us again!