SiddeshSambasivam / Mulitilingual-Toxic-Comment-Classification

A BERT model to identify toxicity comments across multiple languages trained on TPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mulitilingual-Toxic-Comment-Classification


A BERT model to identify toxicity comments across multiple languages trained on TPUs

NOTE: The model is still in fintuning stage

Context

This repository contains the notebooks for the competition Jigsaw Multilingual Toxic Comment Classification in kaggle.

About the Competition

It only takes one toxic comment to sour an online discussion. The Conversation AI team, a research initiative founded by Jigsaw and Google, builds technology to protect voices in conversation. A main area of focus is machine learning models that can identify toxicity in online conversations, where toxicity is defined as anything rude, disrespectful or otherwise likely to make someone leave a discussion. If these toxic contributions can be identified, we could have a safer, more collaborative internet.

Progress

Fintuned the model and was able to increase the accuracy from 0.8249 to 0.9001 on the multilingual test data.

About

A BERT model to identify toxicity comments across multiple languages trained on TPUs


Languages

Language:Jupyter Notebook 100.0%