Pie Models
This repository contains pretrained models for Pie (A Framework for Joint Learning of Sequence Labeling Tasks).
More on Pie: https://github.com/emanjavacas/pie.
Find a model
Models are arranged by language. TODO: add a json documentation file per model.
German (de)
german-ren.model.tar
: Lemmatizer pretrained on a subset of the Referenzkorpus Mittelniederdeutsch/Niederrheinisch: https://www.slm.uni-hamburg.de/ren.html
Spanish (es)
spanish-AnCora.model.tar
: Lemmatizer pretrained on the AnCora corpus for Spanish (part of the Universal Dependencies)
Old French (fro)
french-Geste.model.tar
: Lemmatizer pretrained on the Geste corpus
fro-poslemmes_cat-lemma-2019_01_22-02_34_11.tar
: lemmatizer and POS-tagger trained on the Geste corpus, and other Old French data from the École des chartes.
Target task: lemma.
Accuracy on test data
lemma: 0.9383
pos: 0.9473
fro-poslemmes_cat-lemma-2019_01_23-00_34_12
: same as the previous one, but using pre-trained word embeddings from a large unlabelled corpus.
Target task: lemma.
Accuracy on test data
lemma: 0.9409
pos: 0.9468
fro-poslemmes_cat-lemma-2019_01_24-00_05_57.tar
: same as the previous one, but using convolutions (cnn
) for the character embeddings.
Target task: lemma.
Accuracy on test data
lemma: 0.9462
pos: 0.9509
model_fro_poslemmesmorph.tar
: POS-tagger, lemmatizer and morphological analyzer trained on the Geste corpus
Latin (lat)
capitula.model.tar
: Lemmatizer pretrained on a non-open source dataset of medieval latin
Turkish (tur)
turkish-IMST.model.tar
: Lemmatizer pretrained on the IMST corpus for Turkish (part of the Universal Dependencies)
Example config file for training a lemmatizer
lemma.config.json
is an example config file for training a lemmatizer to reasonable good accuracy.
PIE
Installation
For more information check the repo at , but in short:
virtualenv env -p python3.7
source env/bin/activate
pip3 install -r requirements.txt