sudip-padhye / Data-Lake-using-PySpark

Data Lake hosted on the AWS EMR cluster with S3 buckets used as source and output storages. The analysis was done using AWS Athena.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sudip-padhye/Data-Lake-using-PySpark Stargazers