ncbi / sra-tools

SRA Tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem of fasterq dump --split-3 in Ubuntu(v 3.10.0), error in Quality score expression, different in macOS(v 3.0.1/v 3.1.0)

Jyi-Yang opened this issue · comments

Hi there,
I have used the fasterq dump --split-3 SRR15347541 to download fastq files from SRA both on the Linux server and my own laptop.
But when I want to check the base quality score, there shows some problems.

On Mac(v 3.0.1):
(qiime2-amplicon-2024.2) apple@Iris 2024_1 % head -20 SRR15347541_1_mac.fastq
@SRR15347541.1 1 length=250
TNCGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGCAAGACAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGTATGGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGACCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAAC
+SRR15347541.1 1 length=250
C#>>AABCCFBCGGGGGGGGGGHGGGGGHHHHGHHHGGGGHHHGGGGGGGGGGGGFGEFFDFGE2GGFFF@GFBGGHHHHGGGGCGHFHHFGHHGFHGHHGHHHHHHHGFFDGHHFHFFGFG2@><?E?GGFDGGGG@GEG/DGHDGBA@DC:CCGBHF/BGHFFFFDA?AFBBGFGCCFG.9AFFF-9>DFB>-C@DE?FFBFFFBEFFFFFB/9/;FFBADDFFBFF/BAADFDFFFFFFFFDDAFFF

On Mac(v 3.1.0):
@SRR15347541.1 1 length=250
TNCGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGCAAGACAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGTATGGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGACCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAAC
+SRR15347541.1 1 length=250
C#>>AABCCFBCGGGGGGGGGGHGGGGGHHHHGHHHGGGGHHHGGGGGGGGGGGGFGEFFDFGE2GGFFF@GFBGGHHHHGGGGCGHFHHFGHHGFHGHHGHHHHHHHGFFDGHHFHFFGFG2@><?E?GGFDGGGG@GEG/DGHDGBA@DC:CCGBHF/BGHFFFFDA?AFBBGFGCCFG.9AFFF-9>DFB>-C@DE?FFBFFFBEFFFFFB/9/;FFBADDFFBFF/BAADFDFFFFFFFFDDAFFF

Both the files above showed a normal format.

On Linux:
@SRR15347541.1 1 length=250
TNCGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGCAAGACAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGTATGGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGACCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAAC
+SRR15347541.1 1 length=250
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

On Mac:
So I transferred the fastq file from the Linux server to Mac and use head -20 SRR15347541_1_linux.fastq to check:

@SRR15347541.1 1 length=250
TNCGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGCAAGACAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGTATGGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGACCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAAC
+SRR15347541.1 1 length=250
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

It seems there are some errors there.
And I have tried both fasterq dump --split-3 and fastq dump --split-3, there are the same problems.

Could you help me with this?
Thanks a lot.

You dump different runs:

  • on Mac - SRA Normalized Format files with full base quality scores,
  • on Linux - SRA Lite files with simplified base quality scores.

Run vdb-config --interactive on both systems. I think you will find that the setting for "Prefer SRA Lite files ..." is different, that it is on for the Linux host and off for the Mac. If so, that would be the cause of the difference in the output.