shahab-sarmashghi / RESPECT

Estimating repeat spectra and genome length from low-coverage genome skims

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paired-end, single-end

mylena-s opened this issue · comments

Hi!
I have a question regarding the input files. I have paired end reads in three files, for only one sample: PE1.fq, PE2.fq and SE.fq (unpaired). I tried to run RESPECT specifing only the directory where all the files are, and then each file was used to make independent estimations. Should I concatenate all filles or interleave the paired files?

Thanks in advanced

Hi,
Ideally I would recommend first merging the read pairs (using e.g. BBMerge), and then concatenating with single-end reads. However, you can also concatenate them all and see how the results look like; I do not expect that not merging overlapping reads have a large impact. Lastly, you do not need high coverage with RESPECT and in fact high coverage slows down the computation and we have optimized RESPECT algorithm for low coverage. In our benchmarking, 1X to 4X coverage of the genome should be enough.