devesh-002 / nlp_project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transformer Models for Text Coherence Assessment

In this repository, we are working on Text Coherence Assessment of paper.

Install Preprocessed dataset from here and add in folder processed_data, folder.

To train the model, you have to use the run.sh file and change the parameters in it as required. Then simply do the following:

The metrics are as follows:

  • corpus can take one of 'gcdc' or 'wsj'.
  • sub_corpus can take anyone value from 'Clinton', 'Enron', 'Yelp' or 'Yahoo' if corpus is gcdc
  • arch can take one of vanilla, hierarchical
  • task can take one of 3-way-classification, minority-classification,sentence-ordering or sentence-score-prediction for GCDC dataset and only sentence-ordering for WSJ dataset
  • model_name defines transformer model to use. (by-default its's roberta-base) For training custom model
bash try.sh

To make changes to try.sh file

python3 main.py --arch <arch_name> --corpus <corpus_name>   --task <task_name>

For evaluating on datasets, do the following:

bash infer.sh

To make changes in inferences:

python3 main.py --sub_corpus <name if gcdc> --inference --arch <arch_name>  --corpus <dataset_name>  --freeze_emb_layer  --task <task_name>  --checkpoint_path <saved_checkpoint_path>

We also have submitted the models here

About

License:MIT License


Languages

Language:Python 91.4%Language:Shell 8.6%