How to use -F option?
bede opened this issue · comments
I am struggling to understand the useful looking -F
option, which allows one to pass a fasta file from which k-mers are extracted and used as reads. I suspect I have misunderstood the manual:
-F k:<int>,i:<int>
Reads are substrings (k-mers) extracted from a FASTA file <s>. Specifically, for every reference sequence in FASTA file <s>, Bowtie 2 aligns the k-mers at offsets 1, 1+i, 1+2i, ... until reaching the end of the reference. Each k-mer is aligned as a separate read. Quality values are set to all Is (40 on Phred scale). Each k-mer (read) is given a name like <sequence>_<offset>, where <sequence> is the name of the FASTA sequence it was drawn from and <offset> is its 0-based offset of origin with respect to the sequence. Only single k-mers, i.e. unpaired reads, can be aligned in this way.
I have unsuccessfully tried, for example, the following:
$ bowtie2 -x NC_029549.1 -f NC_029549.1.fa -F k:150,i:1
FASTA and FASTA sampling formats are mutually exclusive.
(ERR): bowtie2-align exited with value 1
Might someone be able to provide an example of how this feature should be used?
Thank you!
Thank you -- we are looking into this. I suspect the mention of <s>
is a spurious hold-over from the Bowtie 1 manual, and that we should have said <r>
-- which the variable we use in the Bowtie 2 manual to refer to the unpaired reads file specified with -U
. We'll get a more definitive answer soon.
Hello,
Your command line was fine, with the exception that the k
and i
should be left out. I have pushed a fix to the bug_fixes
branch that should resolve the mutually exclusive error thrown when -f
was specified with -F
.
@BenLangmead -- we updated the -f
option to behave like -q
in that it is simply a flag that specifies the format of the input files to follow. That way a user can do something like this:
bowtie2 -x index -f -1 mate1.fa -2 mate2.fa
or bowtie2 -x index -q -1 mate1.fq -2 mate2.fq
or bowtie2 -x index -f --interleaved input.fa
or bowtie2 -x index -q --intereaved input.fa
In the case of FASTA-continuous this allows any one the following to be parsed the same way:
N.B. unpaired reads, -U
, are default in bowtie2
bowtie2 -x index -f -F 10,2 input.fa # fasta explicit, unpaired inferred
bowtie2 -x index -F 10,2 input.fa # fasta and unpaired are inferred
bowtie2 -x index -F 10,2 -U input.fa # fasta is inferred, unpaired explicit
bowtie2 -x index -f -F 10,2 -U input.fa # all explicitly specified
I hope this makes sense.
Speedy! Thank you both 🙏
Thanks for fixing the mutual exclusivity issue as well as with how I was using -F
. The bug_fixes
branch is now working as expected with bowtie2 -x NC_029549.1 -f NC_029549.1.fa -F 150,1