mkrnr / lexer-parser

A parser to extract text from Wikipedia, Enron, Acquis, and Reuters XML corpora. Currently work in progress.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lexer-parser

A parser to extract text from Wikipedia, Enron, Acquis, and Reuters XML corpora. Currently work in progress.

About

A parser to extract text from Wikipedia, Enron, Acquis, and Reuters XML corpora. Currently work in progress.


Languages

Language:Java 99.9%Language:Shell 0.1%