lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not sorting /1 and /2 reads properly.

jallmer opened this issue · comments

I just cloned and made seqtk.

I wanted to use it to split a mixed fastq file into two for each read pair.
According to the seqtk seq -h I could use -1 and -2 for that:
seqtk seq -1 toAssemble_mixedPairs.fastq > toAssemble_1.fastq
seqtk seq -2 toAssemble_mixedPairs.fastq > toAssemble_2.fastq

I used 'grep /1 toAssemble_2' to confirm there is no /1 in the file. That seems fine.

The toAssemble_1 file, however, contains /2 entries.
Counting them reveals 70mio /1 and 280mio /2 reads (~80GB mixed file).

Any ideas?

The input fastq has to be interleaved.