ritika-0111 / Bias-Toxic-Classification

Bert Classification on Jigsaw Data with Gender as a basic genre, followed by identifying Bias in Toxic Classification.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bias-Classification

Dataset

In this project, the dataset used is Jigsaw Unintended Bias in Toxicity Classification. It is available on Kaggle (https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data)

Data Files

  • train.csv
  • test.csv

Notebooks

In this Project, Gender is chosen as a basic genre for Identifing Bias.

  • Data_Preparation.ipynb: In this ipython notebook, we prepare data so that it can be used in BERT_Data-Classification and hence let us understand about bias.

  • BERT_Data-Classification.ipynb: In this notebook, we perform text classification by fine-tuning a BERT-based model.

  • bias-toxicity-classification.ipynb: In this notebook, toxicity Classification using Logistic Regression and using LSTM architecture.

Strategy:

Importing libraries
Data Cleaning
Exploratory Data Analysis
Data Splitting
Using Logistic Regression
Using LSTM - Single LSTM layer architecture
Comparing AUC/Designed Metrics AUC

About

Bert Classification on Jigsaw Data with Gender as a basic genre, followed by identifying Bias in Toxic Classification.


Languages

Language:Jupyter Notebook 100.0%