RoiB / CS-7650-Project

muti-word bias neutralization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


muti-word bias neutralization.


Under Project folder, do the following:
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python
>> import nltk;"punkt")

You need download pretrained bert model.

  1. Download the Bert pretrained model from s3 1.1 wget 1.2 mv bert-base-uncased-pytorch_model.bin pytorch_model.bin
  2. Download the Bert config file from s3 2.1 wget 2.2 mv bert-base-uncased-config.json config.json
  3. Download the Bert vocab file from s3 3.1 wget 3.2 mv bert-base-uncased-vocab.txt bert_vocab.txt
  4. Rename:
    • bert-base-uncased-pytorch_model.bin to pytorch_model.bin
    • bert-base-uncased-config.json to config.json
    • bert-base-uncased-vocab.txt to bert_vocab.txt
  5. Place model ,config and vocab file into the ./src/strongClassifier/pybert/pretrain/bert/base-uncased directory.
  6. Modify your data format according to kaggle data and place in pybert/dataset.
    • you can modify the to adapt your data.
  7. Run python --do_data to preprocess data.
  8. Run python --do_train --save_best --do_lower_case to fine tuning bert model.
  9. Run --do_test --do_lower_case to predict new data.

You need download pretrained GLOVE embedding.

Download (, unzip, and put glove.6B.100d.txt to ./src/seq2seq/

You need to install R software environment and its packages "mclust" and "rjson".

Structure of the code

Please consult main.h to get a sense of how to run the code for each model. You might need to change file names during experiment.

At the root of the project, you will see:

data: contains data to train, validate and test our system performance.

src: contains model implementations of our system, data analysis scripts and model run results.

baseline: test result from baseline models.


muti-word bias neutralization


Language:Python 77.2%Language:Jupyter Notebook 22.1%Language:R 0.4%Language:Shell 0.3%