Amazon EMR is the industry leading cloud-native big data platform for processing vast amounts of data quickly and cost-effectively at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the dynamic scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis for a fraction of the cost of traditional on-premises clusters. EMR gives teams the flexibility to run use cases on single-purpose, short lived clusters that automatically scale to meet demand, or on long running highly available clusters using the new multi-master deployment mode.
Learn more about Amazon EMR here.
Part | Lab Name | Lab Description |
---|---|---|
1 | 1a - Getting Started | Connect to the AWS Management Console |
1b - Cloud9 | Create the Cloud9 Environment | |
1c - EMR | Create the EMR Cluster | |
2 | 2a - S3 | Create and Populate your S3 Bucket |
2b - Hive CLI | Run Hive via Hive Shell CLI | |
2c - Hive and EMR Steps | Run Hive via EMR Steps | |
2d - Pig and EMR Steps | Run Pig via EMR Steps | |
3 | 3a - Spark Submit | Run Spark via Spark Submit |
3b - Spark Logging | Work with Spark Logs and Spark UI | |
4 | 4 - EMR Notebooks | Run PySpark via EMR Notebooks/Jupyter |
5 | 5 - Next Steps | Next Steps for EMR |