victorknox / Hate-Speech-Detection-in-Hindi

A rule based Hate Speech Detector for Hindi

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hate-Speech-Detection-in-Hindi

Table of Contents
  1. About The Project
  2. Built With
  3. Usage

About The Project

Hate speech is extremely common on platforms such as Twitter, Facebook, comments sections and even blogs or biased online publications, and even though things like profanity filters exist, they only filter out obscenities and swear words.We find that the vast majority of hate speech consists of veiled attacks, or otherwise uses words that would in other contexts be completely innocuous but are being used to attack an individual or group. Most of even this is contextualized, so to know the context of each of the sentences and then gauge the hatefulness to a near perfect accuracy would require some knowledge of the topic under discussion which is beyond the scope of our rule-based algorithm. However, we find we can achieve a healthy precision in not just the detection of hate speech, but also its segregation into weakly or strongly hateful speech.

We use the paper linked here as reference to build a tool that searches a Hindi corpus (in this case, a collection of tweets - the corpus we used can be found here) for hate speech and then categorizes the tweets into “weakly hateful”, “strongly hateful” or “no hate” based on the amount of hate content present.

Built With

Usage

  • Run the notebook
  • Head over to the Data section and change the file name to the data you want to use
  • The results will be saved in the results.csv file

For more details regarding the project, please refer to the Project Report

About

A rule based Hate Speech Detector for Hindi


Languages

Language:Jupyter Notebook 100.0%