There are 3 repositories under amazon-emr topic.
Reference Architectures for Datalakes on AWS
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
This repo provides cross-account integration code samples using Amazon S3 Access points
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
:notebook: Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.
Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.
Data Lake with Spark
Amazon EMR Automatic Scaling using Custom Metrics
An implementation in Scala of kNN and NCC based on Spark
Used Amazon's Elastic MapReduce to rank the top 20 nodes based on PageRank of graphs with over 100,000 nodes http://courses.cms.caltech.edu/cs144/homeworks/rankmaniac.pdf
Udacity Data Engineering Capstone project
CMPT 732 Project: Our project revolves around a bike-sharing firm, and as analysts for that business, we will be using several big data tools to offer insights into various use cases, predicting their future profits and assisting them in expanding their business.
Udacity Data Engineering Nanodegree Program
With Amazon EMR and machine learning techniques supported by PySpark, a model was built to assist the fictitious music streaming service provider to predict customer churn rate based on user click data.
Unofficial Ansible module for Amazon EMR
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
A simple Java-Scala mixed project template for Apache Spark