callmesanfornow / masters-thesis

Codebase for my Master's Thesis on Cross-Lingual Abuse Detection using Multi-Modal features

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bridging the Language Gap: A Multi-modal Approach for Comprehensive Audio Abuse Detection in Indian Languages

Code Repository for the Dissertation as a course requirement for Master of Science (Data Science and Computing)

Abstract

Social media platforms, with their diverse communication styles encompassing text, audio, and video, have empowered the formation of global communities and intercul- tural exchange. However, these same features can also be exploited to harbor online abuse. Current methods for abuse detection primarily focus on analyzing textual content, neglecting the complexities inherent in spoken language, such as the use of loan words and culturally-specific nuances. This work proposes an approach, based on (Sharon et al., 2022)’s study, to address this challenge: a multi-modal model for cross-lingual audio abuse detection. The approach goes beyond simply analyzing audio features by additionally incorporating the emotions conveyed through speech and any accompanying text data. This multifaceted approach aims to achieve a more comprehensive understanding of the content being analyzed.

Furthermore, this research investigates the hypothesis that there may be un- derlying similarities between abusive language across different languages, through a para-linguistic of abusive language. This line of inquiry is particularly relevant in the context of developing a cross-lingual approach to audio abuse detection. By leveraging insights from (Sharon et al., 2022; Gupta et al., 2022a) study on audio abuse detection, this work aims to create a more inclusive online environment that fosters safe communication across cultural and linguistic boundaries, by study non- textual features. By bridging the gap in cross-lingual audio abuse detection, this research paves the way for a more robust and culturally aware approach to online safety.

Note: These are non-refactored codes, so they are not the most optimal for deployment. So, please use this code for your use with discretion.

About

Codebase for my Master's Thesis on Cross-Lingual Abuse Detection using Multi-Modal features

License:MIT License


Languages

Language:Python 100.0%