Py-Elotl

Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.

Developed by:

Paul Aguilar @penserbjorne, paul.aguilar.enriquez@hotmail.com
Robert Pugh @Lguyogiro, robertpugh408@gmail.com

Requiere python>=3.X

Development Status Pre-Alpha. Read Classifiers
pip package: elotl
GitHub repository: ElotlMX/py-elotl

Installation

Using `pip`

pip install elotl

From source

git clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .

Use

Working with corpus

import elotl.corpus

Listing available corpus

Code:

print("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
    print(row)

Output:

Name		Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-otomí parallel corpus']

Loading a corpus

If a non-existent corpus is requested, a value of 0 is returned.

axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
    print("The name entered does not correspond to any corpus")

If an existing corpus is entered, a list is returned.

axolotl = elotl.corpus.load('axolotl')
for row in axolotl:
    print(row)

['Hay que adivinar: un pozo, a la mitad del cerro, te vas a encontrar.', 'See tosaasaanil, see tosaasaanil. Tias iipan see tepeetl, iitlakotian tepeetl, tikoonextis san see aameyalli.', '', 'Adivinanzas nahuas']

Each element of the list has four indices:

non_original_language
original_language
variant
document_name

tsunkua = elotl.corpus.load('tsunkua')
  for row in tsunkua:
      print(row[0]) # language 1
      print(row[1]) # language 2
      print(row[2]) # variant
      print(row[3]) # document

Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra

Package structure

The following structure is a reference. As the package grows it will be better documented.

elotl/                              Top-level package
          __init__.py               Initialize the package
          corpora/                  Here are the corpus data
          corpus/                   Subpackage to load corpus     
          nahuatl/                  Nahuatl language subpackage
                  orthography.py    Module to normalyze nahuatl orthography and phonemas
          utils/                    Subpackage with useful functions and files
                  fst/              Finite State Transducer functions
                        att/        Module with static .att files
test/                               Unit test scripts

Development

Requirements

python3
HFST
GNU make
virtualenv
Python packages
- setuptools
- wheel

Quick build

virtualenv --python=/usr/bin/python3 venv
source venv/bin/activate
make all

Step by step

Build FSTs

Build the FSTs with make.

make fst

Create a virtual environment and activate it.

virtualenv --python=/usr/bin/python3 venv
source venv/bin/activate

Update `pip` and generate distribution files.

python -m pip install --upgrade pip
python -m pip install --upgrade setuptools wheel
rm -rf build/ dist/
python setup.py clean sdist bdist_wheel

Testing the package locally

python -m pip install -e .

Send to PyPI

python -m pip install twine
twine upload dist/*

License

Mozilla Public License 2.0 (MPL 2.0)

ElotlMX / py-elotl

Py-Elotl

Installation

Using `pip`

From source

Use

Working with corpus

Listing available corpus

Loading a corpus

Package structure

Development

Requirements

Quick build

Step by step

Build FSTs

Create a virtual environment and activate it.

Update `pip` and generate distribution files.

Testing the package locally

Send to PyPI

License

References

About

Languages

Py-Elotl

Installation

Using pip

From source

Use

Working with corpus

Listing available corpus

Loading a corpus

Package structure

Development

Requirements

Quick build

Step by step

Build FSTs

Create a virtual environment and activate it.

Update pip and generate distribution files.

Testing the package locally

Send to PyPI

License

References

About

Languages

Using `pip`

Update `pip` and generate distribution files.