How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process

Question

How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process

smruti241 opened this issue a year ago · comments

Hi @jerryji1993 , @Zhihan1996 , @project-delphi , @hjgwak , @timlautk ,

I read your paper and its very interesting. I have a dataset which consists of 6-mers only. I want to divide my dataset into test, dev and train data and assign them labels for fine tuning process directly (no pre-training required, I will use pre-trained models). Can you please tell me the procedure or any script is available in the folders of this tool? Please let me know. Thanks!

Moein Hasani · Answer 1 · Tue Mar 21 2023 01:56:59 GMT+0800 (China Standard Time)

Hi yes there is a way to load the models with HuggingFace I have done it in this repository: https://github.com/Moeinh77/Virus-DNA-Classification

Smruti Panda · Answer 2 · Tue Mar 21 2023 02:57:32 GMT+0800 (China Standard Time)

@Moeinh77 can you please tell me how to use it? I didnt understand properly. I have kmer data already (6-mer data). I want to use pre-trained models for fine tuning. I dont have labels added in my kmer data