JesseGuerrero/attacking_neural_text_detectors

Attacking Neural Text Detectors

The original dataset in "./data/" is 100% synthetic, generated by GPT-2. we are trying to see if they can be fooled as human written. Run main.py to start experiments, here are some global constants...
EXPERIMENT_NAME is the name of the folder to hold the results files
ADVERSARIAL_TYPE is the type of changes we make to each text.
TEXT_TO_CHANGE is the number of texts to make adversarial.

Adversarial Types:
-'do-nothing': Nothing is done
-'replace-char': Replace homoglyphs below
-'random-order-replace-char': Same as replace char except the input text lines are shuffled
-'misspelling': Replaces certain words with misspellings from misspellings.json.

Original README.md

Code for "Attacking Neural Text Detectors" (https://arxiv.org/abs/2002.11768).

Run python download_dataset.py to download the GPT-2 top k-40 neural text test set created by OpenAI. For more documentation regarding this and similar datasets, visit https://github.com/openai/gpt-2-output-dataset.

OpenAI RoBERTa neural text detector can be downloaded by running wget https://openaipublic.azureedge.net/gpt-2/detector-models/v1/detector-large.pt.

Install requirements via pip install -r requirements.txt.

Run python main.py to run a sample experiment.

JesseGuerrero / attacking_neural_text_detectors

Attacking Neural Text Detectors

Original README.md

About

Languages