leomaurodesenv / datathon-2019-data-engineer

An example how to consume a web API and develop a web crawler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Enginner - Datathon Example

Data Engineers are the data professionals who prepare the "big data" infrastructure to be analyzed. They often design, build, integrate data from various data resources. Also they, usually, run some ETL (Extract, Transform and Load) on top of big datasets.

This repository present an example how to consume a web API and develop a web crawler, in order to develop our own Data Lake (i.e. , data repository of blobs or raw files).

As Data Lake was used Amazon S3. Also, the code was developed in Python 3.


Python

  1. Example consuming the API.
python python/deputados_api.py
  1. Example of a web crawler.
python python/noticias_crawler.py
  1. Amazon S3 handler.
  • python/s3_handler.py

Also look ~

About

An example how to consume a web API and develop a web crawler


Languages

Language:Python 100.0%