GiuseppeDellaCorte's repositories
It-Chapterize
A tool for extracting chapters from Gutenberg Project Italian raw text e-books. RegEx are used to match chapter headings and extract the text between them.
WikiSearchEngine
A Python implementation of a tf-idf based search engine for a subset of the English Wikipedia