dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.

Home Page:https://dieterichlab.org/software/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fail at 'Filtering by read counts'

LGray95 opened this issue · comments

Hi Tobias,

I am attempting to run DCC on my data but the program does not pass the 'Filtering by read counts' step. I have the most recent version from GitHub installed.

Here is the log file from my most recent attempt.

2019-05-24 17:12:22,545 DCC 0.4.7 started
2019-05-24 17:12:22,545 DCC command line: /srv/scratch/janitz/tools/DCC/DCC/main.py @/srv/scratch/z5199519/pbs_scripts/DCC/samplesheet -mt1 @/srv/scratch/z5199519/pbs_scripts/DCC/mate1 -mt2 @/srv/scratch/z5199519/pbs_scripts/DCC/mate2 -D -R /srv/scratch/janitz/tools/DCC//hg19_repeats.gtf -an /srv/scratch/janitz/genome_files/hg19/Annotation/Genes/genes.gtf -Pi -F -M -Nr 5 6 -fg -G -A /srv/scratch/janitz/genome_files/hg19/Sequence/WholeGenomeFasta/genome.fa
2019-05-24 17:12:22,574 Starting to detect circRNAs
2019-05-24 17:12:22,574 Stranded data mode
2019-05-24 17:12:22,574 Please make sure that the read pairs have been mapped both, combined and on a per mate basis
2019-05-24 17:12:22,574 Collecting chimera information from mates-separate mapping
2019-05-24 17:43:26,836 started circRNA detection from file _tmp_DCC/controlChimeric.out.junction.VKD59O
2019-05-24 17:43:26,836 started circRNA detection from file _tmp_DCC/enrichedChimeric.out.junction.4TAWQ5
2019-05-24 17:44:44,919 Read 38314873.-.38315053.SRR445016.145270886 has more than 2 count.
2019-05-24 17:44:44,939 Read 38314873.-.38315053.SRR445016.145270886 has more than 2 count.
2019-05-24 17:50:55,383 finished circRNA detection from file _tmp_DCC/controlChimeric.out.junction.VKD59O
2019-05-24 20:23:24,732 Read 38314873.-.38315053.SRR445016.145270886 has more than 2 count.
2019-05-24 20:26:19,772 finished circRNA detection from file _tmp_DCC/enrichedChimeric.out.junction.4TAWQ5
2019-05-24 20:26:19,773 Combining individual circRNA read counts
2019-05-24 20:26:34,983 Write in annotation
2019-05-24 20:26:34,983 Select gene features in Annotation file
2019-05-24 20:28:04,548 Filtering started
2019-05-24 20:28:04,549 Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering
2019-05-24 20:28:08,399 Filtering by read counts

Just to make sure, to produce the repeats.gtf I simply downloaded the two .gtf files from UCSC and combined them into a new file with cat.

I am also running this on a Linux HPC

Thanks for your help!

Lachlan

Dear @LGray95,

thank your for reporting your issues.

just to be sure, did you try running with lower filter criteria, e.g.. -Nr 2 2 to make sure this problem is not because of too low counts?

You could also disable the filtering by repeat file to make sure nothing is lost throughout that step.

Cheers,
Tobias

Hi Tobias,

Thank you for your reply. I should have properly understood the default settings and changed them with respect to my experiment.

After adjusting for at least two counts in two samples I now have the required output files.

Cheers,

Lachlan Gray