CounterfactualGAN

Code accompanying the paper Generating Realistic Natural Language Counterfactuals (Marcel Robeer, Floris Bex and Ad Feelders, 2021).

Abstract

Counterfactuals are a valuable means for understanding decisions made by machine learning (ML) systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We propose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.

Software

Download the software directly at https://aclanthology.org/2021.findings-emnlp.306, under Software. Details on the used hyperparameters and hyperparameter tuning are included in Appendix A of the paper.

Method

CounterfactualGAN aims to find targeted counterfactuals for explaining black-box NLP classifiers and regressors in a model-agnostic manner. It assumes (1) access to the training set a black-box was trained on (or a similar sufficiently large domain-specific dataset) and (2) the ability to query the predictive function of the black-box. Generator G, discriminator D, encoder Enc and decoder Dec are trained in a two-phase process:

Datasets

CounterfactualGAN is compared against three baseline methods on three datasets (accessible through dataset.py):

Dataset	Class	Task	URL	Folder format
Hatespeech	`Hatespeech()`	Regresssion	[url]	`hatespeech_data.csv`
SST-2	`SST()`	Binary classification	[url]	`SST2/*.tsv`
SNLI	`SNLI()`	Three-class classification	[url]	`snli_1.0/snli_1.0_*.txt`

Citation

@inproceedings{robeer-etal-2021-generating-realistic,
    title = {Generating Realistic Natural Language Counterfactuals},
    author = {Robeer, Marcel and Bex, Floris and Feelders, Ad},
    booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
    month = nov,
    year = {2021},
    address = {Punta Cana, Dominican Republic},
    publisher = {Association for Computational Linguistics},
    url = {https://aclanthology.org/2021.findings-emnlp.306},
    pages = {3611--3625},
}

Siki-cloud / CounterfactualGAN