Data Engineers are the data professionals who prepare the "big data" infrastructure to be analyzed. They often design, build, integrate data from various data resources. Also they, usually, run some ETL (Extract, Transform and Load) on top of big datasets.
This repository present an example how to consume a web API and develop a web crawler, in order to develop our own Data Lake (i.e. , data repository of blobs or raw files).
As Data Lake was used Amazon S3. Also, the code was developed in Python 3.
- Example consuming the API.
python python/deputados_api.py
- Example of a web crawler.
python python/noticias_crawler.py
- Amazon S3 handler.
python/s3_handler.py
- Created by Leonardo Mauro ~ leomaurodesenv
- Presented by Itera - GitHub