lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seqtk sample ignores random seed

MatthewRalston opened this issue · comments

Hello,
I'm experiencing a trouble recreating your example. I am certain this is not user error.
I have subsampled 10k reads as follows

>wc -l single_out.fq
2026312
>seqtk sample [-s $RANDOM] single_out.fq 10000 | wc -l
252
>seqtk sample [-s $RANDOM] single_out.fq 10000 | sha256sum
e279e6251a911ee24 ...
>seqtk sample [-s $RANDOM] single_out.fq 10000 | wc -l
252
>seqtk sample [-s $RANDOM] single_out.fq 10000 | sha256sum
e279e6251a911ee24 ...

#Also

>seqtk sample [-s $RANDOM] single_out.fq 1000 | wc -l
760
>seqtk sample [-s $RANDOM] single_out.fq 100000 | wc -l
760 # huh?

In contrast, setting my selection to 10000 in the following one-liners works fine. Not only is the correct number of reads produced (10000), but the data is fairly random according to the checksums.

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/subsampling_reads.pdf

I've tried running make clean and re-make-ing, no difference. Checking out the latest release commit did not change the subsampling either.