fabiopernisi / awesome-cultural-nlp

Resources for cultural NLP research

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Cultural NLP: Awesome

A curated list of awesome cultural NLP resources, inspired by awesome-computer-vision.

Table Of Contents

Survey

Title Conference / Journal Paper Code Remarks
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art Arxiv 2024 2406.03930
Towards Measuring and Modeling “Culture” in LLMs: A Survey Arxiv 2024 2403.15412 Github Cool paper!
Challenges and Strategies in Cross-Cultural NLP ACL 2022 2203.10020

Dataset

Title Conference / Journal Paper Code Remarks
CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies Arxiv 2024 2404.15238
NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models Arxiv 2024 2404.12464 Data Data
An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance Arxiv 2024 2404.01247 Code and Data Data + Application
No Culture Left Behind: Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking on 1000+ Sub-Country Regions and 2000+ Ethnolinguistic Groups Arxiv 2024 2402.09369v1 Data
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models Arxiv 2024 (under review) 2404.16019 Repository Code and Data
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis NAACL 2024 2308.16705 Data+Code
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence LREC-COLING '24 https://arxiv.org/pdf/2403.06412 Data
Bridging Cultural Nuances in Dialogue Agents through Cultural Value Surveys EACL Findings 2024 2401.10352 Dataset
Culturally Aware Natural Language Inference EMNLP 2023 (Findings) 2023.findings-emnlp.509 Data
Global Voices, Local Biases: Socio-Cultural Prejudices across Languages EMNLP 2023 2310.17586 Data Data+Analysis
NORMSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly EMNLP 2023 2210.08604 Code and Data NormsKB
GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition Neurips 2023 2301.02560 Code and Data
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models ACL 2023 2305.11840 Code
FORK: A Bite-Sized Test Set for Probing Culinary Cultural Biases in Commonsense Reasoning Models ACL Findings 2023 2023.findings-acl.631 Dataset
Multi-lingual and Multi-cultural Figurative Language Understanding ACL Findings 2023 2305.16171 Code
EnCBP: A New Benchmark Dataset for Finer-Grained Cultural Background Prediction in English ACL Findings 2022 2203.14498
Re-contextualizing Fairness in NLP: The Case of India AACL 2022 2209.12226 Data Data+Analysis
Visually Grounded Reasoning across Languages and Cultures EMNLP 2021 2109.13238 Website EMNLP 2021 Best Paper
Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences ACL 2020 2020.acl-main.477/

Image Captioning

Title Conference / Journal Paper Code Remarks
CIC: A framework for Culturally-aware Image Captioning IJCAI 2024 2402.05374 Webpage

Models

Vision and Language

Title Conference / Journal Paper Code Remarks
GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods CVPR 2023 2301.01893 Code (not released yet)

Evaluation

LLMs

Title Conference / Journal Paper Code Remarks
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting Arxiv 2024 2406.11661
Extrinsic Evaluation of Cultural Competence in Large Language Models Arxiv 2024 2406.11565
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge Arxiv 2024 2404.06664
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models ACL 2024 2305.14456 Code

Text-to-image

Title Conference / Journal Paper Code Remarks
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention Arxiv 2024 2407.00377v1
On the Cultural Gap in Text-to-Image Generation Arxiv 2023 2307.02971 Code

VLMs

Title Conference / Journal Paper Code Remarks
From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models Arxiv 2024 2407.00263

Analysis

Text-to-image

Title Conference / Journal Paper Code Remarks
ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation ACL 2024 2401.06310
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity ICLR 2024 2308.06198 Code
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis JAIR 2023 2209.08891 Code
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models Arxiv 2023 2310.01929 Code (not released yet)
Inspecting the Geographical Representativeness of Images from Text-to-Image Models ICCV 2023 2305.11080
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale FAccT '23 2211.03759
Multilingual Conceptual Coverage in Text-to-Image Models ACL 2023 2306.01735 Code

LLMs

Title Conference / Journal Paper Code Remarks
Exploring Changes in Nation Perception with Nationality-Assigned
Personas in LLMs Arxiv 2024 2406.13993
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting Arxiv 2024 2404.10199v1 Code
Knowledge of cultural moral norms in large language models ACL 2023 2306.01857
Multilingual Language Models are not Multicultural: A Case Study in Emotion WASSA: ACL 2023 2307.01370
Social Commonsense for Explanation and Cultural Bias Discovery
DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures LREC-COLING 2024 2403.14651 Code

VLMs

Title Conference / Journal Paper Code Remarks
Multilingual Diversity Improves Vision-Language Representations Arxiv 2024 2405.16915
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision–Language Models Arxiv 2024 2405.13777
Computer Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception Arxiv 2024 2310.14356
Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing arxiv 2024 2402.06015
‘Person’ == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion EMNLP 2023 Findings 2310.19981

Cross-cultural Variations

Title Conference / Journal Paper Code Remarks
Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales EMNLP 2023 2023.emnlp-main.311
Social Commonsense for Explanation and Cultural Bias Discovery EACL 2023 2023.eacl-main.271.pdf
Cross-cultural variation of speech-accompanying gesture: A review Language and Cognitive Processes: Volume 24, Issue 2, 2009 10.1080/01690960802586188

Alignment

Models

Title Conference / Journal Paper Code Remarks
Investigating Cultural Alignment of Large Language Models Arxiv 2024 2402.13231
Unintended Impacts of LLM Alignment on Global Representation Arxiv 2024 2402.15018
Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study C3NLP: EACL 2023 2303.17466 Analysis
Probing Pre-Trained Language Models for Cross-Cultural Differences in Values C3NLP: EACL 2023 2203.13722 Analysis

Data

Title Conference / Journal Paper Code Remarks
NLPositionality: Characterizing Design Biases of Datasets and Models ACL 2023 (Outstanding Paper) 2023.acl-long.505.pdf Website

Methodology

Data

Title Conference / Journal Paper Code Remarks
Cultural Concept Adaptation on Multimodal Reasoning EMNLP 2023 EMNLP Main 18

Applications

Title Conference / Journal Paper Code Remarks
Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks EACL 2021 2006.09336 Sentiment Analysis

Contributing

Please feel free to send me pull requests or email (khanuja.simran7@gmail.com) to add links.

Licenses

License

CC0

To the extent possible under law, Simran Khanuja has waived all copyright and related or neighboring rights to this work.

About

Resources for cultural NLP research