Introduction to Amazon EMR

Amazon EMR is the industry leading cloud-native big data platform for processing vast amounts of data quickly and cost-effectively at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the dynamic scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis for a fraction of the cost of traditional on-premises clusters. EMR gives teams the flexibility to run use cases on single-purpose, short lived clusters that automatically scale to meet demand, or on long running highly available clusters using the new multi-master deployment mode.

Learn more about Amazon EMR here.

Labs

Part	Lab Name	Lab Description
1	1a - Getting Started	Connect to the AWS Management Console
	1b - Cloud9	Create the Cloud9 Environment
	1c - EMR	Create the EMR Cluster
2	2a - S3	Create and Populate your S3 Bucket
	2b - Hive CLI	Run Hive via Hive Shell CLI
	2c - Hive and EMR Steps	Run Hive via EMR Steps
	2d - Pig and EMR Steps	Run Pig via EMR Steps
3	3a - Spark Submit	Run Spark via Spark Submit
	3b - Spark Logging	Work with Spark Logs and Spark UI
4	4 - EMR Notebooks	Run PySpark via EMR Notebooks/Jupyter
5	5 - Next Steps	Next Steps for EMR

enrialonso / EMRintro

Introduction to Amazon EMR

Labs

About