openai / moderation-api-release

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluation dataset for the paper "A Holistic Approach to Undesired Content Detection"

The evaluation dataset data/samples-1680.jsonl.gz is the test set used in the following paper:

@article{openai2022moderation,
  title={A Holistic Approach to Undesired Content Detection},
  author={Todor Markov and Chong Zhang and Sandhini Agarwal and Tyna Eloundou and Teddy Lee and Steven Adler and Angela Jiang and Lilian Weng},
  journal={arXiv preprint arXiv:2208.03274},
  year={2022}
}

Each line contains information about one sample in a JSON object and each sample is labeled according to our taxonomy. The category label is a binary flag, but if it does not include in the JSON, it means we do not know the label.

Category Label Definition
sexual S Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
hate H Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
violence V Content that promotes or glorifies violence or celebrates the suffering or humiliation of others.
harassment HR Content that may be used to torment or annoy individuals in real life, or make harassment more likely to occur.
self-harm SH Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
sexual/minors S3 Sexual content that includes an individual who is under 18 years old.
hate/threatening H2 Hateful content that also includes violence or serious harm towards the targeted group.
violence/graphic V2 Violent content that depicts death, violence, or serious physical injury in extreme graphic detail.

About

License:MIT License