Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

This work aims at generating knowledge-bound counter narratives, using 2 modules, knowledge retrieval module and counter narrative generation module.

Requirements:

Java 1.8+
Solr
Keyphrase digger

transformers
rouge_score
spaCy

Knowledge Retrieval Module

Under KN_CONAN_final_data, we provide final CONAN dataset paired with corresponding silver knowledge. If you wish to prepare your own knowledge repository, check the steps below. Otherwise, skip this section.

Download CONAN dataset and knowledge repository
Prepare queries
Retrieve relevant knowledge
Select knowledge sentences

1. Download Data

1.1 Hate countering dataset

CONAN: CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech.

1.2. Knowledge Repository

We use the following datasets for creating relevant knowledge.

Newsroom: Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies.
WikiText-103: Pointer sentinel mixture models.

2. Prepare Queries

2.1. Query extraction

We use Keyphrase Digger to extract keyphrase queries for both hate speech and counter narratives in CONAN.

1. create a txt file for each HS and CN in CONAN, run create_text_file.py
1. Make sure that the resulting files from i. are stored under KD/KD-Runner/target in your keyphrase Digger reporsitory after compiling
1. Retrieve keyphrases for HS and CN using Keyphrase Digger, store and run run_kd.sh under KD/KD-Runner/target
1. Extract retrieved keyphrases from iii. and add them in CONAN data using extract_keyphrase.py

2.2. Query generation

We use transformer implementation to train and generate keyphrase queries.

3. Retrieve relevant knowledge

Retrieve relevant knowledge using Solr, run retrieve_kn_solr.py)

Solr is used to index articles in knowledge repository and retrieve relevant knowledge given a query.

Some solr commands:

Launch solr: run solr-8.8.1/bin/solr restart or ./bin/solr restart
Index data (e.g., index all articles under datasets/wikitext/ to knowledge repository called knowledgecollection): bin/post -c knowledgecollection -p 8989 datasets/wikitext/*
An example of searching information about islamic faith in the field content from knowledge repository called knowledgecollection: curl "http://localhost:8989/solr/knowledgecollection/select?q=(content:islamic faith)&rows=10&wt=json"

Check this tutorial on how to install solr, index data and advanced methods for searching data in detail.

4. Select knowledge sentences

Apply knowledge sentence selector to get the top-N knowledge sentences and save it in a single file, 1 entry per line, run kn_sentence_retriever.py
Create train, valid, and test data, run create_modelling_data.py.

Counter Narrative Generation Module

Multi-domain Knowledge-grounded hate countering dataset

The Gold Knowledge Test Set can be downloaded here, containing hate speech, counter-narrative pairs coupled with relevant backgroud knowledge. It consists of 195 pairs covering multiple hate targets (islamophobia, misogyny, antisemitism, racism, and homophobia).

Citation

For more details on data partition procedure, please see our paper.

@inproceedings{chung-etal-2021-towards,
    title = "Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech",
    author = "Chung, Yi-Ling  and
      Tekiro{\u{g}}lu, Serra Sinem  and
      Guerini, Marco",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.79",
    doi = "10.18653/v1/2021.findings-acl.79",
    pages = "899--914",
}

yilingchung / Towards_KN_CN_Generation