thu-coai / SentiLARE

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fine tune on a new dataset

un-lock-me opened this issue · comments

commented

Hi @thu-coai @hzhwcmhf @MaLiN2223 @zqwerty @xiaotianzi @truthless11 and thanks so much for making your code available.
I want to fine tune the code on a new dataset that the format is very similar to IMDB dataset (it has a couple of sentences and label is positive/negative/neutral). Could you please advise on what changes I need to make?

I appreciate your time and help :).

commented

Another question is that for preprocessing the new dataset do I need to all the script in this link: https://github.com/thu-coai/SentiLARE/tree/master/preprocess
If so, is there any order for doing that?

Thanks :)

Hi, I suggest that you can follow these steps to adapt our codes to your own dataset:

  1. Prepare your own dataset in the same format as our provided raw dataset, such as IMDB. The link to download the raw dataset / preprocessed dataset is provided in README.
  2. Preprocess the raw dataset with our codes. If your task is sentence-level sentiment classfication, you should refer to prep_sent.py. You may need additional files like SentiWordNet and the representation of its glosses. We have mentioned this in our code.
  3. Run the classification code on your own dataset just as on IMDB. Some arguments may be modified such as the data path.

Hope this can help you.