LukeSmithxyz / corpus-latinum

Luke's Latin Tagger and (under construction) Corpus

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Corpus Latinum Lucae

This will be tools to create a searchable Latin Corpus built from texts from theLatinLibrary.com.

Right now, I've finished a part-of-speech tagger that uses Whitacker's Words to tag text documents. This is what latin_tag.py is.

latin_tag.py

The tagger. Feed it a text via command-line argument (or many) and will produce a tagged equivalent in FILENAME.tagged.

Dependencies:

Known bugs

  • Can't handle text with semicolons. Or brackets []. Will fix soon.

Next on the list:

  • system for generating the tagged corpus
  • way to search the corpus

About

Luke's Latin Tagger and (under construction) Corpus

License:Other


Languages

Language:Python 100.0%