Generate corpus from various ebook formats / supported output: xml (cqpweb, cwb) txt (AntWebCorpusFramework, AntConc, AntPConc)
Currently hosted in private repository. Functionality will be split into several TextBlob model extensions.
See http://textblob-de.readthedocs.org/en/latest/extensions.html for progress status.