fine tune on a new dataset
un-lock-me opened this issue · comments
Hi @thu-coai @hzhwcmhf @MaLiN2223 @zqwerty @xiaotianzi @truthless11 and thanks so much for making your code available.
I want to fine tune the code on a new dataset that the format is very similar to IMDB dataset (it has a couple of sentences and label is positive/negative/neutral). Could you please advise on what changes I need to make?
I appreciate your time and help :).
Another question is that for preprocessing the new dataset do I need to all the script in this link: https://github.com/thu-coai/SentiLARE/tree/master/preprocess
If so, is there any order for doing that?
Thanks :)
Hi, I suggest that you can follow these steps to adapt our codes to your own dataset:
- Prepare your own dataset in the same format as our provided raw dataset, such as IMDB. The link to download the raw dataset / preprocessed dataset is provided in README.
- Preprocess the raw dataset with our codes. If your task is sentence-level sentiment classfication, you should refer to prep_sent.py. You may need additional files like SentiWordNet and the representation of its glosses. We have mentioned this in our code.
- Run the classification code on your own dataset just as on IMDB. Some arguments may be modified such as the data path.
Hope this can help you.