enrialonso / EMRintro

Intro to EMR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction to Amazon EMR

Amazon EMR is the industry leading cloud-native big data platform for processing vast amounts of data quickly and cost-effectively at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the dynamic scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis for a fraction of the cost of traditional on-premises clusters. EMR gives teams the flexibility to run use cases on single-purpose, short lived clusters that automatically scale to meet demand, or on long running highly available clusters using the new multi-master deployment mode.

Learn more about Amazon EMR here.

Labs

Part Lab Name Lab Description
1 1a - Getting Started Connect to the AWS Management Console
1b - Cloud9 Create the Cloud9 Environment
1c - EMR Create the EMR Cluster
2 2a - S3 Create and Populate your S3 Bucket
2b - Hive CLI Run Hive via Hive Shell CLI
2c - Hive and EMR Steps Run Hive via EMR Steps
2d - Pig and EMR Steps Run Pig via EMR Steps
3 3a - Spark Submit Run Spark via Spark Submit
3b - Spark Logging Work with Spark Logs and Spark UI
4 4 - EMR Notebooks Run PySpark via EMR Notebooks/Jupyter
5 5 - Next Steps Next Steps for EMR

About

Intro to EMR