A brief overview on what Theseus is, and is not
Theseus is a python package that includes several modules to deal with webpage retrival and text processing.
It is the basis for the deployment of a system based on bash scripts and python
It his the responsible for collecting and processing newspaper pages for the observatorium
Theseus will end up (eventually) being made of 4 groups of programs/modules/scripts inside the theseus package
- Crawler (for online gathering of news items)
- Processor (for processing of textual data)
- Utils (acessory methods and utilities for pre and post processing)
- Examples (To help users to start using theseus)