The goal of this project is to scrape web, it works in a simple yet powerfull manner. You can install that project on multiple machines they will read messages from a kafka topic, enrich them with html content and push them back to another topic. Thi project is tested on 50, 000, 000 messages in a few hours that create a stream of 10 TB data an hour.