Brandyli / AWS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AWS_Athena_yelp_dataset

  • Set up Glue to provision and run the ETL pipeline, taking JSON format of Yelp data set that stores in S3 bucket to queryable format
  • Create S3 Bucket to hold Athena query result
  • Configure Glue Crawler to connect to data source
  • write a series of queries in Athena to implement aggregate calculationlike the state rank then download the result in csv. format

Query example:

SELECT state, COUNT (*) as num_states FROM yelp GROUP BY state ORDER BY num_states DESC LIMIT 10;

AWS Glue

AWS Glue

AWS Crawler

Screen Shot 2020-05-06 at 3 16 40 PM

Queries with AWS Athena

Screen Shot 2020-05-06 at 10 02 36 PM

About


Languages

Language:Jupyter Notebook 100.0%