rpetit3 / fastq-dl

Download FASTQ files from SRA or ENA repositories.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: checksum

MostafaYA opened this issue · comments

Hi, very nice tool.
Just a question, does fastq-dl examine the md5 of downloaded reads or others to verify if the download was not corrupted for any reasons?
Thanks, Mostafa

Hi @MostafaYA

For downloads coming from ENA, there is a md5sum check. If it fails to verify the md5sum, it will cause an error and exit.

You can override this and force the download even if the md5sum does not match. I added this because in very rare cases the md5sum reported by ENA did not match the actual md5sum.

Now for SRA, I assume NCBI handles the check in fasterq-dump. Because it is handling the conversion of NCBIs sra format to FASTQ.

Hope this helps, glad it's been useful for you!
Robert

thanks for the clarification.

Hi @MostafaYA

I'm going to go ahead and close this, please fill free to reopen!

Cheers,
Robert

@rpetit3 I wonder if maybe there should be two 'force' style options.

One to force overwriting existing files (--force) - but still checks the md5 matches - and another to ignore md5s (--ignore). I have an example where the file exists, but it is truncated, and thus the md5 doesn't match, but fastq-dl doesn't overwrite it. I know I can use --force here, but then I don't get the security of the md5 check.

So with none of those options, the behaviour would be that the md5 of an existing file would be checked to see if it matches what I have. With --force it would re-download the file, checking the md4. With --ignore it would download the file if it doesn't exist and not check the md5, or not check the md5 of an existing file. With both --force and --ignore you re-download and don't check md5.