jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add custom tokens to DNATokenizer

WENHUAN22 opened this issue · comments

Hi,
I am wondering is it possible to add tokens to DNATokenizer?

As the function
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
tokenizer.train_new_from_iterator(owndata)
which can add own tokens to bertTokenizer.