iam13atman / Automatic-Language-Classification

Automatic Langauge Classifier is a Machine Learning Based approach to classifying texts into Roman Typed Nepali, Devnagari Nepali and English language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatic Language Classifier

Automatic Langauge Classifier is a Machine Learning Based approach to classifying texts into three Categories.

  • English (en)
  • Nepali (np)
  • Devanagari Nepali (np2)

Here, English refers to texts in English Language. Nepali refers to the texts in Roman Typed Nepali Language, and Devanagari Nepali refers to texts in Devangari Nepali langauge.

Dataset

  • The NLTK based corpus was used as the dataset for English
  • The Roman Typed Nepali dataset was created by extracting social media datas
  • The Devanagari dataset was extracted from a Nepali corpus

Instructions

Extract all the rar files to a "Preprocessing" folder within the same directory before moving on to usage.

Required Libraries

  • NLTK
  • Scikit Learn
  • FastText

Usage:

python trainML.py

About

Automatic Langauge Classifier is a Machine Learning Based approach to classifying texts into Roman Typed Nepali, Devnagari Nepali and English language.


Languages

Language:Python 100.0%