Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.
This is a project of Comunidad Elotl.
Developed by:
- Paul Aguilar @penserbjorne, paul.aguilar.enriquez@hotmail.com
- Robert Pugh @Lguyogiro, robertpugh408@gmail.com
Requiere python>=3.X
- Development Status
Pre-Alpha
. Read Classifiers - pip package: elotl
- GitHub repository: ElotlMX/py-elotl
pip install elotl
git clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .
import elotl.corpus
Code:
print("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
print(row)
Output:
Name Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-otomí parallel corpus']
If a non-existent corpus is requested, a value of 0 is returned.
axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
print("The name entered does not correspond to any corpus")
If an existing corpus is entered, a list is returned.
axolotl = elotl.corpus.load('axolotl')
for row in axolotl:
print(row)
['Hay que adivinar: un pozo, a la mitad del cerro, te vas a encontrar.', 'See tosaasaanil, see tosaasaanil. Tias iipan see tepeetl, iitlakotian tepeetl, tikoonextis san see aameyalli.', '', 'Adivinanzas nahuas']
Each element of the list has four indices:
- non_original_language
- original_language
- variant
- document_name
tsunkua = elotl.corpus.load('tsunkua')
for row in tsunkua:
print(row[0]) # language 1
print(row[1]) # language 2
print(row[2]) # variant
print(row[3]) # document
Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra
The following structure is a reference. As the package grows it will be better documented.
elotl/ Top-level package
__init__.py Initialize the package
corpora/ Here are the corpus data
corpus/ Subpackage to load corpus
nahuatl/ Nahuatl language subpackage
orthography.py Module to normalyze nahuatl orthography and phonemas
utils/ Subpackage with useful functions and files
fst/ Finite State Transducer functions
att/ Module with static .att files
test/ Unit test scripts
- python3
- HFST
- GNU make
- virtualenv
- Python packages
- setuptools
- wheel
virtualenv --python=/usr/bin/python3 venv
source venv/bin/activate
make all
Build the FSTs with make
.
make fst
virtualenv --python=/usr/bin/python3 venv
source venv/bin/activate
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools wheel
rm -rf build/ dist/
python setup.py clean sdist bdist_wheel
python -m pip install -e .
python -m pip install twine
twine upload dist/*