So Miyagawa's repositories
coptic-xml-tool
coptic scriptorium xml editor tool
treebank_data
Perseus Treebank Data
SunoikisisDC-2016
Planning Seminar and SS 2016 Course
KR6a0005
佛般泥洹經-西晉-白法祖
mozc
Mozc - a Japanese Input Method Editor designed for multi-platform
deipnosophistae-reuses
Citable analyses of quotations and text reuses in the Deipnosophistae
homeric-reuse
Citable analyses of Homeric text reuse in the Deipnosophistae
canonical-greekLit
XML Canonical resources for Greek Literature
isri-ocr-evaluation-tools
Automatically exported from code.google.com/p/isri-ocr-evaluation-tools
tesseract
Tesseract Open Source OCR Engine (main repository)
ANNIS
ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation.
ocropy
Python-based tools for document analysis and OCR
PoCoTo
Home of the Postcorrection Tool
canonical
This will be the base repo for all text and annotation data published in the PDL
normalizer
Normalizes orthography
lexical-taggers
lexical taggers (language of origin, lemmatizer) for Sahidic Coptic
keyboard
JavaScript keyboard
corpora-legacy-releases
Corpora-Legacy-Releases
tokenizers
Coptic SCRIPTORIUM Tokenization Script
converter-complex-python
Encode text from legacy ASCII font by Van Damme & Wurst to UTF-8
TheoryOfComputation
A memo of a lecture on Theory of Computation in the University of Tokyo.
WebAlgo-Java-Class
Because most of the code I write is closed source and I wanted to give others a peek into my Java world. So, I have available, upon request, source code and documentation from a Web Algorithms class, part of a Java Certification track at UCB, done while I was independently learning Java. Granted it is probably not up to what I do today but it is at least something for people to look at should they feel the need to know that I have Java experience. During the class I wrote things as diverse as a grails gwt mashup plugin for Eclipse, all the way to a lingpipe, lucene based document classifier, and grammar processor for word separation in Coptic. The classifier (K-Means) and processor was capable of Coptic word splitting with high accuracy armed with very little training data and could distinguish between poetry, religious text and business documents.