stranak / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UDPipe

Build Status

UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary for Linux/Windows/OS X, as a library for C++, Python, Perl, Java, C#, and as a web service. Third-party R CRAN package also exists.

UDPipe is a free software distributed under the Mozilla Public License 2.0 and the linguistic models are free for non-commercial use and distributed under the CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning.

Copyright 2017 by Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.

UDPipe website http://ufal.mff.cuni.cz/udpipe contains download links of both the released packages and trained models, hosts documentation and offers online demo.

UDPipe development repository http://github.com/ufal/udpipe is hosted on GitHub.

Third-party contribution: Instructions how to build UDPipe REST server as Docker image is here: http://github.com/samisalkosuo/udpipe-rest-server-docker. Instructions how to train UDPipe language models using a Docker image is also there.

About

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files

License:Mozilla Public License 2.0


Languages

Language:C++ 77.1%Language:HTML 15.8%Language:Perl 1.7%Language:Ragel 1.3%Language:Shell 1.3%Language:PHP 1.0%Language:SWIG 0.6%Language:Makefile 0.4%Language:C 0.3%Language:Python 0.2%Language:C# 0.1%Language:Java 0.1%Language:JavaScript 0.0%Language:Raku 0.0%Language:CSS 0.0%Language:Dockerfile 0.0%Language:XS 0.0%