tareknaous / camel

CAMeL Dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CAMeL: Cultural Appropriateness Measure Set for LMs

camel

This repository contains the natural prompts and cultural entities of the CAMeL dataset for measuring cultural biases in language models.

For more details, see the accompanying paper: "Having Beer After Prayer? Measure Cultural Bias in Large Language Models", Accepted at ACL 2024

Prompts

The folder prompts provides two types of prompts:

  • Culturally-contextualized prompts inside the camel-co folder, where only Arab entities are appropriate mask fillings
  • Culturally-agnostic prompts inside the camel-ag folder, where either Arab or Western entities are appropriate mask fillings

For both contextualized and agnostic cases, we provide two versions of the prompts:

  • a version for masked-lms where the [MASK] can have left and right natural context
  • a version for causal-lms where we rewrite certain prompts for the natural context to appear behind the [MASK]

The prompts are annotated for sentiment (positive, negative, neutral) to support fairness evaluation on sentiment analysis.

Entities

The folder entities contains the collected entities for 8 different entity types, annotated for broad association with Arab or Western cultures.

Citation

@article{naous2023having,
  title={Having beer after prayer? measuring cultural bias in large language models},
  author={Naous, Tarek and Ryan, Michael J and Ritter, Alan and Xu, Wei},
  journal={arXiv preprint arXiv:2305.14456},
  year={2023}
}

Contact

Tarek Naous: Scholar | Github | Linkedin | Research Gate | Personal Wesbite | tareknaous@gatech.edu

About

CAMeL Dataset

License:Other