vidIQ Technical Assesment
The task at hand are:
1. Load attached data file to S3 Bucket
2. Implement a Partitioned Athena Database creating its schema using python
3. Use Airflow to add new partitions for Daily events
Setup
-
Python3 & Airflow used
-
Dependencies set in requirment file and at imports
-
create aws config
- create file
dl.cfg
- add the following contents (fill the fields)
[AWS] AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= [S3] BUCKET_NAME = OUTPUT_LOCATION = SOURCE_S3_KEY = DEST_S3_KEY = DATABASE =
- create file
-
Initialize Airflow & Run Webserver
-
Run Scheduler (Open New Terminal Tab)
Usage
- First Run upload_to_aws.py then vidIQELT.py
- Access Airflow UI at your localhost
- Create Airflow Connections
- Run dags in Airflow UI
--Alternatively you can just export via cli:
export AIRFLOW_CONN_AWS_DEFAULT="s3://$AWS_CLIENT_ID:$AWS_CLIENT_SECRET@my-bucket?region_name=$AWS_REGION"
export AWS_DEFAULT_REGION=$AWS_REGION
export AWS_ACCESS_KEY_ID=$AWS_CLIENT_ID
export AWS_SECRET_ACCESS_KEY=$AWS_CLIENT_SECRET
Test
airflow test partitioned_athena_and_S3move <EXECUTION_DATE>