Classical Language Toolkit's repositories
latin_proper_names_cltk
A list of ~40K Classical Latin proper names
sanskrit_parallel_gitasupersite
Parallel corpus
greek_training_set_sentence_cltk
Training sets and tokenizer for the Classical Greek language, for use with CLTK
sanskrit_text_jnu
Sanskrit Corpora
greek_proper_names_cltk
A list of ~144K Classical Greek proper names
greek_word2vec_cltk
Greek Word2Vec models
sanskrit_text_sacred_texts
Sanskrit texts from sacred-texts.com
greek_pos_edit_xenophon_anabasis
A human–editable version of a POS–tagged text of Xenophon's Anabasis
latin_text_corpus_grammaticorum_latinorum
Collected Latin Data from Corpus Grammaticorum Latinorum
latin_word2vec_cltk
Latin Word2Vec models
latin_text_antique_digiliblt
Antique Latin Corpus from digilibLT
latin_text_lacus_curtius
Collected Latin files from LacusCurtius
tibetan_pos_tdc
POS tagged corpora from Tibetan in Digital Communication
chinese_text_cbeta_indices
Indices to the CBETA corpus
chinese_text_sheffield
Texts from the Sheffield Corpus of Chinese
csel_openphilology_corpus
CSEL orpus based on https://github.com/OpenGreekAndLatin/csel-dev/
greek_text_lacus_curtius
Collected Greek Texts from Lacus Curtius
latin_treebank_index_thomisticus
Treebank of the works of Thomas Aquinas
tibetan_lexica_tdc
Lexica compiled by Tibetan in Digital Communication
treebank_data
Perseus Treebank Data