Data Engineering Interview Assignment | Wave HQ

2021-09-10

Instructions

ETL script is written in Python
- Python libraries include Pandas, PySpark, Requests, Glob, and SQLAlchemy

The ETL process comprises of the following steps:

Follow the steps below to test the ETL process using sample JSON data files.

Install Python libraries
Open a terminal window and cd to the 'pipeline' folder that contains the etl.py and query.py files

cd c:/usr/documents/Project/pipeline
Run the elt.py script to extract data from API and load into datalake

python etl.py
Run the query.py script to extract data from datalake and load into SQLite to run queries

python query.py

Please see the "output.txt" file for an example of the console log of the pipeline after a test run.