Participatory Approaches to Building Datasets on Abuse

Question

Participatory Approaches to Building Datasets on Abuse

tarunima opened this issue 3 months ago · comments

Description:

Automated approaches to abuse detection rely on annotated datasets. At least at present, unsupervised machine learning alone cannot detect abuse across languages. To fill the gap of abuse detection datasets in India languages, Tattle started the Uli project to specifically create datasets on gendered abuse in Indian languages.But the focus is also to take a survivor centered perspective on abuse. The datasets was created with people of marginalized genders at the receiving end of abuse. The first dataset on abusive tweets helped us develop a methodology for participatory datasets that we would now like to extend to more languages and modalities.

The Scope of This Task:

Review literature about datasets of abuse detection in images, videos and audio.
Create a dataset of images from social media that could be annotated by the existing community of researchers, survivors, activists.
Expand the community of annotators
Qualitative research to define abuse in multimodal datasets
Organize annotations
Release the dataset.

This ticket should be treated as a statement of intent for a multi-year project. If you're interested in collaborating on this project, please leave a comment.

github-actions · Answer 1 · Thu Aug 01 2024 10:30:25 GMT+0800 (China Standard Time)

This issue is stale because it has been open for 30 days with no activity.

Aditya Narayan Sankaran · Answer 2 · Sun Sep 15 2024 16:01:43 GMT+0800 (China Standard Time)

Hi! Is this task still considering participants? I am interested in volunteering.

I research Online Hate Speech in low-resource settings. I have experience in curating datasets for gender-based stereotypes and I have worked on Multi-Modal Audio Abuse Detection in Low Resource Settings.