zwdzwd / transvar

TransVar - multiway annotator for precision medicine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance issue in faidx.py fetch_sequence

mmoisse opened this issue · comments

I noticed a 10x performance drop compared to my older transvar version (https://bitbucket.org/wanding/transvar/commits/8a7a774618174bd591e8821b9c7c7fd5c03ce8c4) for some variants.
I traced back the performance drop to the addition of decode() the fetch_sequence function, which convert seq from str to unicode and apparently the concatenation of unicode is way slower than that of str

line=self.fasta_handle.readline().decode()
line=line[:-1] #Remove newline symbols
seq=seq+line

I suggest to only concatenate the unicode at the end of the loop or remove the decode()

test.vcf.gz

transvar ganno --vcf test.vcf.gz --refversion hg19 --ccds 

Current version: 46.5366 s
Version without decode(): 5.30291 s
Version without one join(): 9.85117 s

I confirm that this patch significantly improves performance of transvar. Thanks!

Thanks for the suggestion and confirmation. Sorry for having missed this. Will merge and integrate soon.