In order to encourage constructive online debates, content control is crucial on social media sites. In this group project, participants are asked to create systems to handle offensive stretches of code-mixed social media material in Tamil.
Offensive Span Identification in Tamil @RANLP-2023
Offensive Language Detection in dravidian languages (Tamil)
Faculty
Slot
Course
Course Code
Dr. Ratnavel Rajalakshmi
L33+L34 (G1 Slot)
Essentials of Data Analytics
CSE3506
Name
Register Number
Branch
Hariket Sukesh Kumar Sheth (Team Leader)
20BCE1975
CSE Core
Manasvi Maheshwari
20BAI1032
CSE AI & ML
Suraj Shah
20BRS1122
CSE Robotics
All of the work completed for the tasks related to Offensive Language Identification that RANLP 2023 organised on Codalab is included in this repository.
To execute these programs, you must have the following:
pytorch
transformers
sadice
seaborn
sklearn
matplotlib
The pretrained transformers BERT, IndicBERT, and XLM-Roberta were employed for the job of Identifying Offensive Language. We have utilised modified versions of these models in addition to the original versions of the pretrained transformers.
The customised versions were created by freezing the basic layers and then layering a fc layer on top of it with nll_loss and sadice loss custom loss routines.
In order to reproduce the results obtained you can clone this repository and place ur dataset path in the train scripts to run the same.
Our results for the Offensive Language Identification Task
Table: Results on Offensive Language Development Dataset
Table: Results on Offensive Language Test Dataset
Model Name
Accuracy
mBERT Cased
0.76
XLMR
0.76
IndicBERT
0.74
XLMR with NLL Loss and Class Weights
0.64
XLMR with Sadice Loss
0.61
mBERT with Sadice Loss
0.61
mBERT with NLL Loss and Class Weights
0.58
Model Name
Accuracy
mBERT Cased
0.75
XLMR
0.75
IndicBERT
0.73
XLMR with NLL Loss and Class Weights
0.64
XLMR with Sadice Loss
0.61
mBERT with Sadice Loss
0.61
mBERT with NLL Loss and Class Weights
0.59
About
In order to encourage constructive online debates, content control is crucial on social media sites. In this group project, participants are asked to create systems to handle offensive stretches of code-mixed social media material in Tamil.