kirillfx / nltk-language-detection

Automatic langage detection with Python and NLTK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nltk-language-detection

N|Solid

Automatic detection of text language with Python and NLTK. This script uses a very simple approach based on stopwords comparaison. The stopwords list with the most commun words wins the association.

Dependencies

you have to install NLTK package for Python to run this script.

How it works

just give the script a brunch of text to analyse and the script will :

  • Parse and tokenize you text
  • Compare the tokens with all stopwords lists contained in NLTK corpus in all available languages
  • Select the most relevant language
  • Calculate the relevancy level of the selected language

Documentation

If you want to know how this script works, just have a look at this blog post titled Detection de langue en NLP i wrote (in french) on my personnal blog le-geek.com

About

Automatic langage detection with Python and NLTK


Languages

Language:Python 100.0%