jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process

smruti241 opened this issue · comments

Hi @jerryji1993 , @Zhihan1996 , @project-delphi , @hjgwak , @timlautk ,

I read your paper and its very interesting. I have a dataset which consists of 6-mers only. I want to divide my dataset into test, dev and train data and assign them labels for fine tuning process directly (no pre-training required, I will use pre-trained models). Can you please tell me the procedure or any script is available in the folders of this tool? Please let me know. Thanks!

Hi yes there is a way to load the models with HuggingFace I have done it in this repository: https://github.com/Moeinh77/Virus-DNA-Classification

@Moeinh77 can you please tell me how to use it? I didnt understand properly. I have kmer data already (6-mer data). I want to use pre-trained models for fine tuning. I dont have labels added in my kmer data