openai/moderation-api-release

Evaluation dataset for the paper "A Holistic Approach to Undesired Content Detection"

The evaluation dataset data/samples-1680.jsonl.gz is the test set used in the following paper:

@article{openai2022moderation,
  title={A Holistic Approach to Undesired Content Detection},
  author={Todor Markov and Chong Zhang and Sandhini Agarwal and Tyna Eloundou and Teddy Lee and Steven Adler and Angela Jiang and Lilian Weng},
  journal={arXiv preprint arXiv:2208.03274},
  year={2022}
}

Each line contains information about one sample in a JSON object and each sample is labeled according to our taxonomy. The category label is a binary flag, but if it does not include in the JSON, it means we do not know the label.

Category	Label	Definition
sexual	`S`	Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
hate	`H`	Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
violence	`V`	Content that promotes or glorifies violence or celebrates the suffering or humiliation of others.
harassment	`HR`	Content that may be used to torment or annoy individuals in real life, or make harassment more likely to occur.
self-harm	`SH`	Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
sexual/minors	`S3`	Sexual content that includes an individual who is under 18 years old.
hate/threatening	`H2`	Hateful content that also includes violence or serious harm towards the targeted group.
violence/graphic	`V2`	Violent content that depicts death, violence, or serious physical injury in extreme graphic detail.

openai / moderation-api-release

Evaluation dataset for the paper "A Holistic Approach to Undesired Content Detection"

About