something wrong in bedtools getfasta -name
StayHungryStayFool opened this issue · comments
BEDtools version 2.26.0.
Example as follow:
mm10-NlaIII.bed:
chr1 0 3000185 HIC_chr1_1@chr1:0-3000185
chr1 3000185 3000316 HIC_chr1_2@chr1:3000185-3000316
chr1 3000316 3000850 HIC_chr1_3@chr1:3000316-3000850
chr1 3000850 3001659 HIC_chr1_4@chr1:3000850-3001659
Code one with parameter -name:
bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed -name | fold -w 70
Result fastq header:
HIC_chr1_7512::chr1:4664567-4666090
HIC_chr1_7511::chr1:4664466-4664567
Code one without parameter -name:
bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed | fold -w 70
Result fastq header:
chr1:5006220-5006371
chr1:5006142-5006220
Can you help me with this question?
Best wish.
I find exactly the same thing with bedtools v2.29.2, seems like the functionality of --name
has changed. Is this now expected behaviour that fastq header should be e.g.:
chr1.tRNA1-ValCAC-::chr1:16725515-16725688(+)
instead of what is used to be (can't now remember the version of the old software I was using):
chr1.tRNA1-ValCAC-(+)
It's true that the -name function has been changed since 2.26.0. The output is not same like what said in the getfasta doc:
$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
$ cat test.bed
chr1 5 10 myseq
$ bedtools getfasta -fi test.fa -bed test.bed -name
>myseq
AAACC
I am using 2.25.0 and it works like above. I really think the old one is what exactlly we need. However, I am not sure it's a bug or intentional, if the former, please fix it.
Sincerly thanks!
It can be fixed by piping through sed:
$ bedtools getfasta -fi test.fa -bed test.bed -name | sed 's/::.*//'
but I would prefer not to have to do the extra step.