jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pre-train

morningsun77 opened this issue · comments

Hi,
I want to pre-train DNABERT with my own data,but I'm not aware of the template data at /example/sample_data/pre.Since the template data has no labels,I want to know if all the data in the template data are gene sequences.
Thanks.

You can find the format in the file DNABERT\examples\sample_data\pre. The text file '6_3k.txt'. You can see how they organized the input data.