This will be tools to create a searchable Latin Corpus built from texts from theLatinLibrary.com.
Right now, I've finished a part-of-speech tagger that uses Whitacker's Words to tag text documents. This is what latin_tag.py
is.
The tagger. Feed it a text via command-line argument (or many) and will produce a tagged equivalent in FILENAME.tagged
.
- Whitacker's Words
bash
- Can't handle text with semicolons. Or brackets []. Will fix soon.
- system for generating the tagged corpus
- way to search the corpus