dawngerpony / sentencer

Sentence extractor with vocabulary filter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentencer

A program to extract translatable sentences from a corpus, based on a known vocabulary.

The vocabulary is stored in a CSV file.

Corpus resources

Getting started

(NB. You might want to set up a virtualenv first)

pip install -r requirements.txt
pip install -r requirements_dev.txt
python scripts/nltk_download.py

Run the tests:

flake8
pytest

Run the program on a sample corpus:

python sentencer/main.py my-day

About

Sentence extractor with vocabulary filter


Languages

Language:Python 95.8%Language:Shell 4.2%