twds-crawler
This repository contains the code to build a highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform. It was part of a datascience-class to get in touch with some of the most common technologies when it comes to big web- and big data processing.
Documentation
A more detailed description of the implementation can be found in my medium.com article.
Trouble Shooting
Additionally I documented some of my challenges in the trouble-shooting.md