amazon-emr

There are 3 repositories under amazon-emr topic.

aws-samples / aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
data-lake data-analytics amazon-emr ingest-data emr-cluster glue hive-metastore data-catalog data-transformation
Language:HTML 75
modern-data-lake-storage-layers
dacort / modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
aws amazon-emr hudi iceberg apache-hudi apache-iceberg delta-lake
Language:Jupyter Notebook 44
garystafford / aws-airflow-demo
Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.
airflow amazon-emr pyspark-applications apache-airflow aws amazon-mwaa
Language:Python 42
garystafford / emr-demo
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
emr-demo amazon-emr pyspark elastic-map-reduce aws spark
Language:Python 38
awslabs / amazon-emr-cli
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
aws amazon-emr apache-spark emr-serverless
Language:Python 33
awslabs / amazon-emr-vscode-toolkit
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
amazon-emr apache-spark pyspark python
Language:TypeScript 28
dacort / demo-code
Bits of code I use during live demos
emr-cluster aws-emr emr-notebooks aws-cloudformation aws-cloudformation-templates aws-athena amazon-athena amazon-emr live-demos
Language:Jupyter Notebook 28
snowplow / dataflow-runner
Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
amazon-emr flink golang-application hadoop spark
Language:Go 19
aws-samples / amazon-emr-with-delta-lake
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
amazon-emr spark deltalake
Language:Jupyter Notebook 17
polakowo / yelp-3nf
3NF-normalize Yelp data on S3 with Spark and load it into Redshift - automate the whole thing with Apache Airflow
yelp-dataset amazon-redshift data-marts data-pipeline dimensional-tables s3-bucket spark etl-process data-warehouse airflow nosql sql amazon-emr cloud 3nf normalization
Language:Jupyter Notebook 12
Ditectrev / Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
amazon-athena amazon-cloudwatch amazon-comprehend amazon-ec2 amazon-emr amazon-s3 amazon-sagemaker amazon-textract amazon-transcribe apache-spark aws aws-batch aws-certified aws-certified-machine-learning aws-glue aws-lambda linear-regression machine-learning mls-c01 neural-network
11
aws-samples / amazon-s3-access-points-for-cross-account-integration-samples
This repo provides cross-account integration code samples using Amazon S3 Access points
amazon-s3 amazon-s3-access-points amazon-emr aws-cross-account-s3-integration
Language:Java 5
build-on-aws / ci-cd-serverless-spark
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
amazon-emr apache-spark aws github-actions serverless spark
Language:Python 5
cameres / emr-spark-jupyter
:notebook: Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
amazon-emr cluster emr jupyter jupyter-notebook spark spark-clusters tutorial
Language:Python 4
DeepHiveMind / Amazon-EMR-on-Amazon-EKS-Spark-job-with-AWS-Step-Functions
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions
amazon-emr amazon-eks spark aws-step-functions aws
3
Ditectrev / Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
amazon-athena amazon-aurora amazon-cloudwatch amazon-ec2 amazon-emr amazon-quicksight amazon-rds amazon-s3 apache-kafka apache-spark aws aws-certified aws-data-analytics aws-glue aws-lambda das-c01 hdfs practice-exam practice-exams practice-test
3
aws-samples / amazon-emr-yarn-capacity-scheduler
Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads
amazon-emr apache-hadoop-yarn aws-cloudformation capacity-scheduler fair-scheduler fifo-scheduler
Language:Shell 1
esakik / data-engineering-essentials
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
apache-beam apache-spark apache-airflow apache-hadoop fluentd embulk digdag mrjob cloud-dataproc cloud-dataflow amazon-emr protocol-buffers apache-avro data-engineering
Language:Python 1
garystafford / emr-superset-demo
Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.
amazon-emr apache-superset superset aws
Language:Python 1
Mohammed-siddiq / Page-Rank-In-Spark
Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.
amazon-emr amazon-s3 dblp-dataset pagerank-algorithm sbt scala spark
Language:Scala 1
Sampsonyu / Data_Lake_with_Spark
Data Lake with Spark
python3 aws amazon-emr amazon-s3 spark spark-sql data-lake elt
Language:Jupyter Notebook 1
tmusabbir / emr-with-custom-metrics
Amazon EMR Automatic Scaling using Custom Metrics
emr amazon-emr amazon-web-services cloudwatch bigdata emr-cluster
Language:Shell 1
DarthVi / knn-ncc-spark
An implementation in Scala of kNN and NCC based on Spark
scala spark knn ncc machine-learning amazon-emr
Language:Scala 0
jaceyca / Rankmaniac
Used Amazon's Elastic MapReduce to rank the top 20 nodes based on PageRank of graphs with over 100,000 nodes http://courses.cms.caltech.edu/cs144/homeworks/rankmaniac.pdf
mapreduce amazon-emr
Language:Python 0
MrBenA / Udacity_Capstone-ETL_Pipeline
Udacity Data Engineering Capstone project
amazon-web-services amazon-s3 amazon-emr amazon-redshift apache-spark python data-modeling relational-databases extract-transform-load
Language:Python 0
Rituraj0480 / Quad-Squad
CMPT 732 Project: Our project revolves around a bike-sharing firm, and as analysts for that business, we will be using several big data tools to offer insights into various use cases, predicting their future profits and assisting them in expanding their business.
python pyspark sql amazon-emr amazon-s3 saprk sparkml tableau git jupyter-notebook
Language:Python 0
robertgv / Data_Lake_in_AWS
Udacity Data Engineering Nanodegree Program
udacity udacity-nanodegree udacity-data-engineer-nanodegree amazon-s3 amazon-emr apache-spark
Language:Python 0
timchansdp / Churn-Prediction-with-PySpark
With Amazon EMR and machine learning techniques supported by PySpark, a model was built to assist the fictitious music streaming service provider to predict customer churn rate based on user click data.
churn-prediction big-data pyspark amazon-emr
Language:Jupyter Notebook 0
WorksApplications / ansible_aws_emr
Unofficial Ansible module for Amazon EMR
amazon-emr ansible-modules emr-management
Language:Python 0
cmeb45 / fuzzyjoin
mapreduce string-matching aws-emr amazon-emr map-reduce string-similarity
Language:Java
Faisal-AlDhuwayhi / Data-Lake
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
amazon-emr amazon-s3 aws big-data big-data-processing cloud-computing data-engineering data-lake etl-pipeline pyspark spark sql
Language:Python
Lostefra / SparkTemplate
A simple Java-Scala mixed project template for Apache Spark
amazon-emr intellij java sbt scala spark
Language:Scala

amazon-emr

aws-samples / aws-dbs-refarch-datalake

dacort / modern-data-lake-storage-layers

garystafford / aws-airflow-demo

garystafford / emr-demo

awslabs / amazon-emr-cli

awslabs / amazon-emr-vscode-toolkit

dacort / demo-code

snowplow / dataflow-runner

aws-samples / amazon-emr-with-delta-lake

polakowo / yelp-3nf

Ditectrev / Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

aws-samples / amazon-s3-access-points-for-cross-account-integration-samples

build-on-aws / ci-cd-serverless-spark

cameres / emr-spark-jupyter

DeepHiveMind / Amazon-EMR-on-Amazon-EKS-Spark-job-with-AWS-Step-Functions

Ditectrev / Amazon-Web-Services-Certified-AWS-Certified-Data-Analytics-DAS-C01-Practice-Tests-Exams-Question

aws-samples / amazon-emr-yarn-capacity-scheduler

esakik / data-engineering-essentials

garystafford / emr-superset-demo

Mohammed-siddiq / Page-Rank-In-Spark

Sampsonyu / Data_Lake_with_Spark

tmusabbir / emr-with-custom-metrics

DarthVi / knn-ncc-spark

jaceyca / Rankmaniac

MrBenA / Udacity_Capstone-ETL_Pipeline

Rituraj0480 / Quad-Squad

robertgv / Data_Lake_in_AWS

timchansdp / Churn-Prediction-with-PySpark

WorksApplications / ansible_aws_emr

cmeb45 / fuzzyjoin

Faisal-AlDhuwayhi / Data-Lake

Lostefra / SparkTemplate