DCC very high amount of FP
BarryDigby opened this issue · comments
Not a bug, but I feel compelled to check my methods with you before publishing my results.
Results of a benchmark study are below:
The proportion of common circRNAs called by each tool represeted as a heatmap:
These results are with no BSJ read filtering - which any sensible user should apply to their results. My paper is using these figures/tables to stress the importance of BSJ filtering, DCC performs far better when BSJ > 2 is applied, but I must say the rate of FP is startling compared to other tools.
I wanted to reach out and double-check my methods using DCC:
[ Proc name] [ input file]
- STAR 1st Pass (PE data)
- SJDB File Generation (SJout.tab)
- STAR 2nd Pass (PE reads, SDJB files)
code available in these process blocks: https://github.com/nf-core/circrna/blob/2a5987b0e57a6bbe51bfd2bdbd2413bbe6a0431e/main.nf#L853-L992
- STAR 2nd Pass (Mate 1, SJDB File)
- STAR 2nd Pass (Mate 2, SJDB File)
- DCC (Outputs from 4. & 5.)
code available in these proc blocks: https://github.com/nf-core/circrna/blob/2a5987b0e57a6bbe51bfd2bdbd2413bbe6a0431e/main.nf#L1062-L1230
STAR parameters, IIRC, are default parameters from the documentation: https://github.com/nf-core/circrna/blob/2a5987b0e57a6bbe51bfd2bdbd2413bbe6a0431e/nextflow.config#L51-L73
DCC version 0.5.0, do not have the logs handy I am afraid.
Sim data generation: https://github.com/BarryDigby/circRNA_simu
Hi,
The documentation (https://docs.circ.tools/en/latest/Detect.html#running-circtools-circrna-detection) recommends 5 reads, the internal default parameter of DCC uses at least a BSJ count of 2. We explicitly do not recommend running DCC without BSJ filter, because we are aware of the importance of BSJ filtering.
If there is no specific scientific question that requires this low threshold, users should stay with the default parameters for BSJ count.
Thus, for the comparison I'd suggest to run DCC with default BSJ parameters and not with disabled BSJ filtering.
Cheers,
Tobias