arq5x / bedtools2

bedtools - the swiss army knife for genome arithmetic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

something wrong in bedtools getfasta -name

StayHungryStayFool opened this issue · comments

commented

BEDtools version 2.26.0.
Example as follow:
mm10-NlaIII.bed:
chr1 0 3000185 HIC_chr1_1@chr1:0-3000185
chr1 3000185 3000316 HIC_chr1_2@chr1:3000185-3000316
chr1 3000316 3000850 HIC_chr1_3@chr1:3000316-3000850
chr1 3000850 3001659 HIC_chr1_4@chr1:3000850-3001659

Code one with parameter -name:
bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed -name | fold -w 70
Result fastq header:

HIC_chr1_7512::chr1:4664567-4666090
HIC_chr1_7511::chr1:4664466-4664567

Code one without parameter -name:
bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed | fold -w 70
Result fastq header:

chr1:5006220-5006371
chr1:5006142-5006220

Can you help me with this question?
Best wish.

I find exactly the same thing with bedtools v2.29.2, seems like the functionality of --name has changed. Is this now expected behaviour that fastq header should be e.g.:

chr1.tRNA1-ValCAC-::chr1:16725515-16725688(+)

instead of what is used to be (can't now remember the version of the old software I was using):

chr1.tRNA1-ValCAC-(+)

It's true that the -name function has been changed since 2.26.0. The output is not same like what said in the getfasta doc:

$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10 myseq

$ bedtools getfasta -fi test.fa -bed test.bed -name
>myseq
AAACC

I am using 2.25.0 and it works like above. I really think the old one is what exactlly we need. However, I am not sure it's a bug or intentional, if the former, please fix it.

Sincerly thanks!

It can be fixed by piping through sed:

$ bedtools getfasta -fi test.fa -bed test.bed -name | sed 's/::.*//'

but I would prefer not to have to do the extra step.