yilingchung / Towards_KN_CN_Generation

Knowledge-bound counter speech generation to challenge hate speech

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

pipeline_new

This work aims at generating knowledge-bound counter narratives, using 2 modules, knowledge retrieval module and counter narrative generation module.

Requirements:

Java 1.8+
Solr
Keyphrase digger

transformers
rouge_score
spaCy

Knowledge Retrieval Module

Under KN_CONAN_final_data, we provide final CONAN dataset paired with corresponding silver knowledge. If you wish to prepare your own knowledge repository, check the steps below. Otherwise, skip this section.

  1. Download CONAN dataset and knowledge repository
  2. Prepare queries
  3. Retrieve relevant knowledge
  4. Select knowledge sentences

1. Download Data

1.1 Hate countering dataset

1.2. Knowledge Repository

We use the following datasets for creating relevant knowledge.

2. Prepare Queries

2.1. Query extraction

We use Keyphrase Digger to extract keyphrase queries for both hate speech and counter narratives in CONAN.

    1. create a txt file for each HS and CN in CONAN, run create_text_file.py
    1. Make sure that the resulting files from i. are stored under KD/KD-Runner/target in your keyphrase Digger reporsitory after compiling
    1. Retrieve keyphrases for HS and CN using Keyphrase Digger, store and run run_kd.sh under KD/KD-Runner/target
    1. Extract retrieved keyphrases from iii. and add them in CONAN data using extract_keyphrase.py

2.2. Query generation

We use transformer implementation to train and generate keyphrase queries.

3. Retrieve relevant knowledge

Retrieve relevant knowledge using Solr, run retrieve_kn_solr.py)

Solr is used to index articles in knowledge repository and retrieve relevant knowledge given a query.

Some solr commands:

  • Launch solr: run solr-8.8.1/bin/solr restart or ./bin/solr restart

  • Index data (e.g., index all articles under datasets/wikitext/ to knowledge repository called knowledgecollection): bin/post -c knowledgecollection -p 8989 datasets/wikitext/*

  • An example of searching information about islamic faith in the field content from knowledge repository called knowledgecollection: curl "http://localhost:8989/solr/knowledgecollection/select?q=(content:islamic faith)&rows=10&wt=json"

Check this tutorial on how to install solr, index data and advanced methods for searching data in detail.

4. Select knowledge sentences

  1. Apply knowledge sentence selector to get the top-N knowledge sentences and save it in a single file, 1 entry per line, run kn_sentence_retriever.py
  2. Create train, valid, and test data, run create_modelling_data.py.

Counter Narrative Generation Module

Multi-domain Knowledge-grounded hate countering dataset

The Gold Knowledge Test Set can be downloaded here, containing hate speech, counter-narrative pairs coupled with relevant backgroud knowledge. It consists of 195 pairs covering multiple hate targets (islamophobia, misogyny, antisemitism, racism, and homophobia).

Citation

For more details on data partition procedure, please see our paper.

@inproceedings{chung-etal-2021-towards,
    title = "Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech",
    author = "Chung, Yi-Ling  and
      Tekiro{\u{g}}lu, Serra Sinem  and
      Guerini, Marco",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.79",
    doi = "10.18653/v1/2021.findings-acl.79",
    pages = "899--914",
}

About

Knowledge-bound counter speech generation to challenge hate speech


Languages

Language:Julia 99.3%Language:Python 0.7%Language:Shell 0.0%