jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Long DNA seqs embeddings

rominaappierdo opened this issue · comments

Hello and thank you for sharing your work.

I need to obtain DNA sequences embeddings as in #11
However some of the sequences I would like to represent are longer than 512.

I read I could split each sequence longer than 512 in sub-sequences and then concatenate their embeddings... however this would result in having embeddings of different length for each sequence, depending on their length (I need embeddings of the same length).

Is there any way you could help me achieve my goal?

Thanks you in any case