bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about min-orf change for blastx in v2.1.0

Stikus opened this issue · comments

Hello, thanks for great tool.

We are using https://github.com/humanlongevity/HLA in our pipelines, and it provides great results but is unmaintained, unfortunately.

One of it scripts use diamond blastx:

open(IN, "diamond blastx -t . -C 20000 --index-mode 1 --seg no --min-score 10 --top 20 -c 1 -d $root/data/hla -q $fastq_file -f tab --quiet -o /dev/stdout |") or die $!;

We changed this command according to this issue #140 (comment):

open(IN, "diamond blastx -t . --sensitive --masking 0 --min-score 10 --top 20 -c 1 -d $root/data/hla -q $fastq_file -f tab --quiet -o /dev/stdout |") or die $!;

But after DIAMOND 2.1.0 change:

The blastx mode will now mask any open reading frame below the minimum required length as specified by --min-orf.
The blastx mode will only count unmasked letters towards the block size.

Results changed.

According to wiki

--min-orf/-l #

Ignore translated sequences that do not contain an open reading frame of at least this length. By default this feature is disabled for sequences of length below 30, set to 20 for sequences of length below 100, and set to 40 otherwise. Setting this option to 1 will disable this feature.

We can add --min-orf 1 to command. We tested it and it brings back old results. But should we? Can anyone give us advice?

Using this will produce more accurate results at the expense of longer runtime. So you should probably use it if run time is not an issue.

@bbuchfink results with --min-orf 1 are more accurate, correct? Thanks for fast answer!

@bbuchfink results with --min-orf 1 are more accurate, correct?

yes

Thanks, closing issue