Performance issue in faidx.py fetch_sequence
mmoisse opened this issue · comments
I noticed a 10x performance drop compared to my older transvar version (https://bitbucket.org/wanding/transvar/commits/8a7a774618174bd591e8821b9c7c7fd5c03ce8c4) for some variants.
I traced back the performance drop to the addition of decode()
the fetch_sequence
function, which convert seq from str
to unicode
and apparently the concatenation of unicode
is way slower than that of str
Lines 81 to 83 in 28a725d
I suggest to only concatenate the unicode
at the end of the loop or remove the decode()
transvar ganno --vcf test.vcf.gz --refversion hg19 --ccds
Current version: 46.5366 s
Version without decode(): 5.30291 s
Version without one join(): 9.85117 s
I confirm that this patch significantly improves performance of transvar. Thanks!
Thanks for the suggestion and confirmation. Sorry for having missed this. Will merge and integrate soon.