robustness-albert

This is the code for "How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task".

models. results.

setup.

To run the training and evaluation for this paper, please set up the environment:

# Create environment.
conda create -n robustness-albert python=3.7
conda activate robustness-albert

# Install packages.
python setup.py develop
pip install -r requirements.txt

training.

First, create a config file (see configs/example_config.json for an example).

Then, run the following:

robustness_albert/train.py -c configs/CONFIG_FILE_NAME.json

linting & unit testing.

For linting and unit testing, run the following commands:

# Linting.
flake8 robustness-albert

# Unit testing. 
pytest -s tests/

notebooks.

test_models_checklist.ipynb: This notebook carries out the CheckList tests on all the random seeds.

dev_set_results.ipynb: This notebook loads the results of the different random seeds on the development set of SST-2 and calculates the Fleiss' Kappa agreement between the models.

extract_names_sst2.ipynb: This notebook extracts and saves names that occur in the movie reviews of the train and test set, so we can use these names for the designed CheckList capabilities. Resulting names can be found in assets/names_sst2_train.json and assets/names_sst2_test.json

plot_checklist_results.ipynb: This notebook plots the results achieved from the CheckList tests for all random seeds. It plots the error rates and overlap ratios and calculates the Fleiss' Kappa agreement.

extract_test_labels_sst2.ipynb: As the SST-2 dataset in HuggingFace does not come with the test labels, this notebook is used to extract them using the original SST-2 data from GLUE. Labels can be found in assets/sst2_test_labels.json

create_checklist_tests_sst2.ipynb: This notebook creates the CheckList test suite that we use for the results. Resulting test suite can be found in assets/testset_19_07_21.pkl.

cltl / robustness-albert

robustness-albert

setup.

training.

linting & unit testing.

notebooks.

About

Languages