Automatic Langauge Classifier is a Machine Learning Based approach to classifying texts into three Categories.
- English (en)
- Nepali (np)
- Devanagari Nepali (np2)
Here, English refers to texts in English Language. Nepali refers to the texts in Roman Typed Nepali Language, and Devanagari Nepali refers to texts in Devangari Nepali langauge.
- The NLTK based corpus was used as the dataset for English
- The Roman Typed Nepali dataset was created by extracting social media datas
- The Devanagari dataset was extracted from a Nepali corpus
Extract all the rar files to a "Preprocessing" folder within the same directory before moving on to usage.
- NLTK
- Scikit Learn
- FastText
python trainML.py