ortografix

Welcome to ortografix, a seq2seq model for automatic ortografic simplification, coded with pytorch 1.4.

Install

via pip:

pip3 install ortografix

or, after a git clone:

python3 setup.py install

Train

To train a model, run:

ortografix train \
--data /abs/path/to/training/data \
--model-type gru \
--shuffle \
--hidden-size 256 \
--num-layers 1 \
--bias \
--dropout 0 \
--learning-rate 0.01 \
--epochs 10 \
--print-every 100 \
--use-teacher-forcing \
--teacher-forcing-ratio 0.5 \
--output-dirpath /abs/path/to/output/directory/whereto/save/model \
--with-attention \
--character-based

Test

Qualitative evaluation

To qualitatively evaluate the output of the model on a set of 10 randomly selected sentences from a given dev/test set, run:

ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory/ \
--random 10

Quantitative evaluation

To quantitatively evaluate the output of the model on a given dev/test set, run:

ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory

Quantitative evaluation will return:

The sum of all edit (Levenshtein) distance computed across all test pairs
The average edit distance computed across all test pairs
The average normalized edit distance
The average normalized edit similarity

All measure are computed via textdistance.