levitation / estnltk

Open source tools for Estonian natural language processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Estnltk -- Open source tools for Estonian natural language processing

Estnltk provides common natural language processing functionality such as paragraph, sentence and word tokenization, morphological analysis, named entity recognition, etc. for the Estonian language.

The project is funded by EKT (Eesti Keeletehnoloogia Riiklik Programm, https://www.keeletehnoloogia.ee/).

Installation

The recommended way of installing estnltk is by using the anaconda python distribution and python 3.5+.

We have installable packages built for osx, windows-64, and linux-64.

The command for installing estnltk is:

conda install -c estnltk -c conda-forge estnltk

If you want to use EstNLTK in an interactive web application, type the command

jupyter notebook

To run our tutorials, download them as a zip file, unpack them to a directory and run the command jupyter notebook in that directory.


The alternative way for installing if you are unable to use the anaconda distribution is:

python -m pip install estnltk

This is slower, more error-prone and requires you to have the appropriate compilers for building the scientific computation packages for your platform.

Find more details in the installation tutorial.

Documentation

Release 1.4.1 documentation is available at http://estnltk.github.io/estnltk/1.4.1/index.html. For previous versions refer to http://estnltk.github.io/estnltk. For more tools see http://estnltk.github.io.

Additional educational materials on estnltk are available on the web page of the NLP course taught at the University of Tartu: https://courses.cs.ut.ee/2015/pynlp/fall.

Citation

Once you use Estnltk in your work, plase cite us as follows:

@InProceedings{ORASMAA16.332,
author = {Siim Orasmaa and Timo Petmanson and Alexander Tkachenko and Sven Laur and Heiki-Jaan Kaalep},
title = {EstNLTK - NLP Toolkit for Estonian},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
year = {2016},
month = {may},
date = {23-28},
location = {Portorož, Slovenia},
editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
address = {Paris, France},
isbn = {978-2-9517408-9-1},
language = {english}
}

About

Open source tools for Estonian natural language processing

License:GNU General Public License v2.0


Languages

Language:Python 99.0%Language:JavaScript 0.8%Language:HTML 0.1%Language:Shell 0.0%Language:Batchfile 0.0%