aws-emr

There are 9 repositories under aws-emr topic.

adornes / spark_python_ml_examples
Spark 2.0 Python Machine Learning examples
machine-learning spark pyspark python aws aws-emr kaggle
Language:Python 97
adornes / spark_scala_ml_examples
Spark 2.0 Scala Machine Learning examples
machine-learning scala spark aws aws-emr kaggle
Language:Scala 77
jwplayer / sparksteps
:star: CLI tool to launch Spark jobs on AWS EMR
spark aws aws-emr python
Language:Python 67
dacort / demo-code
Bits of code I use during live demos
amazon-athena amazon-emr aws-athena aws-cloudformation aws-cloudformation-templates aws-emr emr-cluster emr-notebooks live-demos
Language:Jupyter Notebook 28
abdullahkhawer / aws-auto-terminate-idle-emr
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
amazon-web-services aws aws-lambda aws-cloudformation cloudformation cft aws-emr terminate boto3 python-3-7 aws-cloudwatch idle emr cloudwatch serverless python etl bigdata datalake automation
Language:Python 26
pyspark-on-aws-emr
Wittline / pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
aws emr-cluster aws-emr python spark pyspark big-data-analytics big-data dataengineering wordcloud-generator ec2-spot ec2-spot-instances
Language:Python 25
terraform-aws-modules / terraform-aws-emr
Terraform module to create AWS EMR resources 🇺🇦
aws-emr aws-emr-clusters aws-emr-serverless terraform terraform-module
Language:HCL 23
aws-data-pipeline
ismaildawoodjee / aws-data-pipeline
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
airflow apache-superset aws aws-ec2 aws-emr aws-iam aws-redshift aws-s3 data-engineering data-pipeline docker elt etl infrastructure-as-code postgresql python sql terraform
Language:Python 20
memosstilvi / emr-cost-calculator
EMR Cost Calculator
aws aws-emr python
Language:Python 17
amzn / rheoceros
Cloud-based AI / ML workflow and data application development framework
bring-your-own-account aws ai flow cloud data-science event-based low-code-framework machine-learning feature-engineering aws-glue aws-emr sagemaker-notebook-instance sagemaker-notebook aws-lambda serverless spark pyspark scala-spark
Language:Python 16
AWS-Big-Data-Projects / Analysing-Census-Data-using-aws
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
apache-spark aws-emr aws-redshift aws-s3
13
AWS-Big-Data-Projects / AWS-EMR
Analyzing Big Data with Amazon EMR
aws-emr big-data hadoop hive s3
12
AWS-Big-Data-Projects / Run-a-Spark-job-within-Amazon-EMR
Run a Spark job within Amazon EMR
aws-emr aws-emr-clusters spark apache-spark aws-s3
Language:Java 12
xonai-computing / xonai-dashboard
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
apache-spark aws aws-emr databricks grafana prometheus python
Language:Python 12
ychantit / airflow_aws_utils
A collection of airflow sample workflows for data processing on aws
aws airflow aws-emr aws-athena pythom
Language:Python 12
linghaol / CommunityDetection-Spark-AWS
A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.
spark aws-emr twitter python
Language:Python 10
mauropelucchi / aws-emr-docker-integration
AWS EMR Docker integration
aws aws-emr spark docker hadoop
Language:Dockerfile 10
jkoth / Data-Lake-with-Spark-and-AWS-S3
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
apache-spark aws-s3 pyspark data-lake etl-pipeline dimensional-model star-schema json-format udacity-nanodegree data-engineering spark-dataframes aws-emr
Language:Python 9
cookiecutter-pyspark-cloud
daniel-cortez-stevenson / cookiecutter-pyspark-cloud
A cookiecutter template for working with PySpark on AWS EMR
aws aws-emr cloudformation-template cookiecutter cookiecutter-datascience cookiecutter-python cookiecutter-template data-science jupyterhub pyspark python spark
Language:Python 8
Nerdward / batch_gh_archive
Data Engineering Project with Terraform, Spark, AWS, Docker, Airflow and other tools
airflow aws spark terraform aws-emr docker
Language:Python 8
sjmiller8182 / Warehousing-Stock-Tweet-Data
A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.
big-data hive aws stock-prices tweets hadoop emr-cluster aws-emr nyse nasdaq twitter data-warehouse star-schemas python3 snowflake-schema warehousing-stock-data
Language:TSQL 7
adornes / spark_r_ml_examples
Spark 2.0 R/SparkR Machine Learning examples
machine-learning spark r rstats aws aws-emr kaggle
Language:R 6
wingkwong / aws-playground
My AWS Playground
aws-lambda aws-s3 aws-apigateway aws-cdk aws-copilot aws-codecommit aws-cognito aws-msk aws-serverless aws-vpc aws-amplify aws-acm aws-cloudfront aws-appsync aws-emr
Language:Python 6
felipeazucares / Airflow-EMR-Redshift
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
apache-spark apache-airflow aws-s3 aws-emr-clusters aws-emr sas7bdat-datasets i94
Language:Python 5
pratikbarjatya / spark-walmart-data-analysis-exercise
Data Analysis Exercise over Walmart Stock
walmart-data-analysis spark data-analysis-exercise aws-emr walmart-stock aws-ec2 hadoop python scala linux
Language:Jupyter Notebook 5
khushal2405 / Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
aws aws-ec2 aws-emr aws-emr-clusters aws-glue aws-lambda aws-s3 glue-jobs lambda lambda-functions lambda-trigger oltp postgres postgresql redshift redshift-aws docker docker-compose docker-container dockerfile
Language:Python 4
khushal2405 / ETL-pipeline-using-Airflow-and-AWS-EMR
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
airflow apache-spark aws aws-emr data-engineering etl pyspark python s3 s3-bucket scala spark
Language:Python 4
abhibalani / emr_lambda
Lambda to start EMR and run a map reduce job
aws aws-lambda aws-emr aws-emr-clusters mapreduce-python hadoop-mapreduce
Language:Python 3
HarshadRanganathan / aws-emr-launcher
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
aws aws-emr emr-cluster
Language:Python 3
JainTanisha / MapReduce-Analysis-on-Amazon-Food-Review-Data
MapReduce Analysis on Amazon Food Review Dataset (Big-Data)
hadoop-mapreduce hadoop-cluster apache-hive mahout tableau aws-s3 aws-ec2 aws-emr r rstudio
3
jomavera / dataPipelineEMR
ETL pipeline with PySpark on EMR orchestrated with Airflow
data-pipeline data-engineering pyspark aws-emr apache-airflow
Language:Python 3
shinde-chandrakant / BigData-Ops-on-TLC-Yellow-Taxi
Analysed New York City's Yellow taxi data set with Big Data tools such as Hadoop, HBase, Sqoop, MapReduce and AWS Cloud Infrastructure.
aws aws-emr aws-s3 big-data-analytics bigdata hadoop mrjob aws-rds data-modeling hbase mapreduce sqoop
Language:Python 3
dhruv007patel / Impact-of-Covid-19-on-Aviation-Industry
This project analyzes the correlation between COVID-19 and the US aviation industry. By studying data on passenger/freight traffic and delays alongside COVID-19 trends, it provides insights into airline and passenger responses. The findings help airlines adapt to the pandemic's impact.
python3 aws-s3 aws-ec2 aws-athena aws-emr spark-sql spark tableau
Language:Python 2
ninjeanne / datastorm
Data Science and Engineering project - Programming for Big Data @ Simon Fraser University (SFU)
aws aws-dynamodb aws-emr aws-lambda aws-s3 big-data bigdata data data-engineering data-science data-visualization python3 spark spark-dataframes spark-mllib spark-sql
Language:Jupyter Notebook 2
RahilBalar98 / Covid-19-And-Aviation-Analysis
CMPT 732 Project - Dealt with 3 large scale databases by joining them to analysis the economic impact of Covid-19 on the airline industry. Fetched data using API and stored in AWS S3 that is retrieved by an AWS EMR cluster that does data computation. Queried into AWS Athena and visualized the results on Tableau by implementing static and dynamic dashboards.
aws aws-emr aws-s3 covid-19 pandas tableau visualization
Language:Python 2
samchenghowing / COMP4442
Analysis and monitoring system using AWS... Also the comp4442 project
aws-apigateway aws-dynamodb aws-emr aws-lambda aws-s3
Language:Python 2

aws-emr

adornes / spark_python_ml_examples

adornes / spark_scala_ml_examples

jwplayer / sparksteps

dacort / demo-code

abdullahkhawer / aws-auto-terminate-idle-emr

Wittline / pyspark-on-aws-emr

terraform-aws-modules / terraform-aws-emr

ismaildawoodjee / aws-data-pipeline

memosstilvi / emr-cost-calculator

amzn / rheoceros

AWS-Big-Data-Projects / Analysing-Census-Data-using-aws

AWS-Big-Data-Projects / AWS-EMR

AWS-Big-Data-Projects / Run-a-Spark-job-within-Amazon-EMR

xonai-computing / xonai-dashboard

ychantit / airflow_aws_utils

linghaol / CommunityDetection-Spark-AWS

mauropelucchi / aws-emr-docker-integration

jkoth / Data-Lake-with-Spark-and-AWS-S3

daniel-cortez-stevenson / cookiecutter-pyspark-cloud

Nerdward / batch_gh_archive

sjmiller8182 / Warehousing-Stock-Tweet-Data

adornes / spark_r_ml_examples

wingkwong / aws-playground

felipeazucares / Airflow-EMR-Redshift

pratikbarjatya / spark-walmart-data-analysis-exercise

khushal2405 / Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow

khushal2405 / ETL-pipeline-using-Airflow-and-AWS-EMR

abhibalani / emr_lambda

HarshadRanganathan / aws-emr-launcher

JainTanisha / MapReduce-Analysis-on-Amazon-Food-Review-Data

jomavera / dataPipelineEMR

shinde-chandrakant / BigData-Ops-on-TLC-Yellow-Taxi

dhruv007patel / Impact-of-Covid-19-on-Aviation-Industry

ninjeanne / datastorm

RahilBalar98 / Covid-19-And-Aviation-Analysis

samchenghowing / COMP4442