jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Over 512 length sequence

ninashenker opened this issue · comments

Hi,

Thanks for the good work.

Is it possible to tweak the program to be able to handle an input of a sequence with a length over 512?
Also is 512 referring to nucleotides or k-mers (like it can take for ex. for a 4-mer (4*512))?

Thanks,
Nina

Hi,

Please refer to #5 for information about handling input with length over 512.
Here 512 refers to number of nucleotides, although we used k-mer implementation for the input. Since the k-mers overlap (i.e. for ATCGAT, we get 3-mers ATC, TCG, CGA, GAT), they actually makes no difference.

Best,
Jerry

Thank you for your quick response, I will look into it!

No problem, feel free to post if you have any other questions! Will close this issue for now.