aws-emr-clusters

There are 3 repositories under aws-emr-clusters topic.

RubensZimbres / Repo-2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
aws-rds anomaly-detection googleassistant googlespeech keras-tensorflow sql-server raspberry-pi-3 tensorflow mathematica mathe wolfram-mathematica aws-emr-clusters pyspark hiveql emr-cluster bert-model bert
Language:Jupyter Notebook 136
terraform-aws-modules / terraform-aws-emr
Terraform module to create AWS EMR resources 🇺🇦
aws-emr aws-emr-clusters aws-emr-serverless terraform terraform-module
Language:HCL 22
AWS-Big-Data-Projects / Run-a-Spark-job-within-Amazon-EMR
Run a Spark job within Amazon EMR
aws-emr aws-emr-clusters spark apache-spark aws-s3
Language:Java 12
suvayu / emr-scripts
Shell scripts for AWS EMR clusters
cluster aws-emr-clusters spark aws-cli
Language:Shell 7
felipeazucares / Airflow-EMR-Redshift
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
apache-spark apache-airflow aws-s3 aws-emr-clusters aws-emr sas7bdat-datasets i94
Language:Python 5
khushal2405 / Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
aws aws-ec2 aws-emr aws-emr-clusters aws-glue aws-lambda aws-s3 glue-jobs lambda lambda-functions lambda-trigger oltp postgres postgresql redshift redshift-aws docker docker-compose docker-container dockerfile
Language:Python 4
abhibalani / emr_lambda
Lambda to start EMR and run a map reduce job
aws aws-lambda aws-emr aws-emr-clusters mapreduce-python hadoop-mapreduce
Language:Python 3
anuragkr29 / TightCommunityDetection
Detect Tight Communities in a social Network
scala spark graphx aws aws-emr-clusters kerbosch graphloader amazon-s3
Language:Scala 2
dvu4 / udacity-data-engineering
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
postgresql etl data-modeling etl-pipeline data-engineering data-engineering-pipeline cassandra-database redshift star-schema amazon-sdk denormalization normalization aws-s3 aws-athena aws-emr-clusters aws-emr aws-lambda spark data-warehouse data-lake
Language:Jupyter Notebook 2
nikhilsu / Product-review-analysis-Spark-MongoDB
Performing various product review analysis on Amazon dataset using Apache Spark and MongoDB
spark spark-sql apache-spark spark-clusters aws-emr-clusters aws-s3 mongodb big-data-analytics
Language:Java 2
rigganni / AWS-Spark-Million-Song-ETL
Load data from the Million Song Dataset into a final dimensional model stored in S3.
aws-emr aws-emr-clusters apache-spark dimensional-model etl parquet parquet-files
Language:Python 2
rshinde03 / Default-Credit-Data-Analysis-and-Prediction-Using-Big-Data
Credit defaulting results in a large profit loss to banks and other credit lenders. The success of the banking industry results in the ability to understand risk. This project uses big data technologies like Mapreduce, HDFS along with PySpark and AWS for analysis of credit history and its prediction
aws aws-s3 hadoop-mapreduce hadoop-hdfs cloudera-manager spark pyspark big-data aws-ec2 aws-emr-clusters hive
Language:Jupyter Notebook 2
silviomori / covid19-datalake
data-engineering data-science data-lake aws airflow spark aws-ecs-cluster aws-ecs ecs emr aws-emr aws-emr-clusters docker docker-container aws-s3 boto3 python python3
Language:Python 2
Adith-Rai / Reddit-Stock-Sentiment-Analyzer
A Cloud based Reddit stock sentiment analyzer that analyzes overall sentiment from a configurable selection of stock subreddits for each stock. The architecture utilizes AWS MSK (Kafka), AWS EMR (PySpark) and AWS Lambda (Python 3) for maximum scalability and the OpenAI API for sentiment analysis through prompt engineering.
aws-ec2 aws-emr-clusters aws-lambda aws-msk openai-api pyspark python3 reddit-api
Language:Python 1
johnnyiller / cluster_funk
An opinionated framework for running big data jobs
aws aws-emr aws-emr-clusters spark pyspark big-data
Language:Python 1
kacperstyslo / most-wanted-programming-skills-finder
With this app, you can see what programming skills are most in-demand in the current job market.
python38 docker docker-compose aws-s3 postgresql pandas javascript css scraper django pyspark serverless-framework shell-scripting aws-lambda-python aws-emr-clusters terraform-aws airflow-dags
Language:Python 1
m1theus / aws-emr-terraform
Example for provisioning AWS EMR service with Terraform
aws-emr aws-emr-clusters emr emr-cluster emr-notebooks terraform terraform-aws
Language:HCL 1
nihil21 / DocxAnonymizer-spark
Stand-alone Scala & Java tool to anonymize OOXML Documents (DOCX)
scala spark anonymisation java aws-emr-clusters parallel-computing parallelization parallel-programming
Language:Java 1
sagardua297 / udacity-data-engineering-nd
Data Pipeline Analytics Platform is an end-to-end generic Big Data pipeline. Involves following tech stack: AWS S3, AWS Redshift, AWS EMR Cluster, Apache Spark, Apache Airflow.
python airflow airflow-dags airflow-plugin airflow-operators spark aws-s3 aws-redshift aws-emr-clusters etl-pipeline data-modeling data-engineering postrgresql cassandra data-warehouse data-lake
Language:Python 1
SRVivek1 / pyspark-rdd-dataframe-examples
PySpark RDD and DataFrame Examples
aws aws-db-instance aws-ec2 aws-lambda aws-redshift aws-s3 pyspark python python-lambda python-script rdd aws-emr-clusters
Language:Python 1
UCloudM / Steam_Analysis_For_Gamers
Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.
python aws apache-spark big-data aws-ec2 aws-emr-clusters data-science
Language:Python 1
xianchen2 / Analyzing_10GB_of_Yelp_Reviews_Data
AWS EMR backed Spark cluster for analyzing Yelp Data
pyspark aws-s3 aws-ec2 aws-emr-clusters apache-spark spark-sql
Language:Jupyter Notebook 1
AleGuarnieri / Data-Lake-ETL
Udacity project: implementing an ETL to process data with Apache Spark and store them in AWS S3 storage
data-lake etl aws-s3 aws-emr-clusters
Language:Python 0
arjunsawhney1 / scalable-ML
In this repo, I build a LogisticRegression prediction model with Dask and PySpark and initialize an AWS EMR cluster to run the entire pipeline.
scalability dask dask-distributed dask-ml spark pyspark aws aws-s3 aws-ec2 aws-emr-clusters logistic-regression
Language:Python 0
Chan2k20 / Wine-Prediction-Prediction-Model-On-AWS-EMR
Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.
random-forest aws-emr-clusters aws-ec2 docker pyspark-mllib wine-quality-prediction ec2-instance aws-s3
Language:Python 0
EricPaul075 / OCP8-Big-data-project-deployed-in-AWS-cloud
Define a big data architecture and perform distributed machine learning calculations on an EMR cluster using AWS
aws-s3 aws-ec2 aws-emr-clusters big-data keras-models pyspark python tensorflow feature-extraction boto3 pca pillow pyarrow hadoop
Language:Jupyter Notebook 0
marcus-repo / etl-spark
ETL Pipeline extracts JSON files from AWS S3 bucket and transforms these using an AWS EMR Spark Cluster and stores the data into an AWS S3 bucket in parquet file format.
spark aws-emr aws-emr-clusters pyspark aws-s3
Language:Python 0
mochan42 / Deploy-a-CNN-in-AWS-image-features-extraction-and-ACP
A CNN is deployed in AWS to extract image features in the context of distributed computing.
apache-spark aws-emr-clusters aws-iam aws-s3 big-data cloud distributed-computing
Language:Jupyter Notebook 0
pavva94 / MovieSentimentAnalysis
MLP for Sentiment Analysis on Movie's Reviews.
scala aws-emr-clusters spark mllib
Language:Scala 0
Prajna-Bahuguna / EventBridge-SNS-Terraform
aws aws-eventbridge aws-sns aws-emr-clusters terraform iac
Language:HCL 0
SagarFall2022 / BigData
Realtime data pipeline
aws-dynamodb aws-emr-clusters aws-kinesis-firehose aws-kinesis-stream aws-lambda aws-s3 datapipeline etl kafka-consumer kafka-producer kafka-streams serverless-framework
Language:Jupyter Notebook 0
tugberkcapraz / capstone_sparkify
Predicting customer churn for the music app, Sparkify, using PySpark on AWS EMR clusters
aws-emr-clusters churn-prediction pyspark
Language:Jupyter Notebook 0
AhmedDouaya / Deploiement_modele_cloud
aws-emr-clusters spark
Language:Jupyter Notebook
im612 / P8_big_data
A scalable prototype of an image recognition engine deployed on AWS.
aws-emr-clusters aws-s3 aws-sagemaker pyspark python3
Language:Jupyter Notebook
justinapnguyen / Big_Data_Wrangling_with_Google_Books_Ngrams
In this project, the skills learned in the Big Data Fundamentals unit will be utilized to load, filter, and visualize a large real-world dataset within a cloud-based distributed computing environment using Hadoop, Spark, Hive, and the S3 filesystem.
aws aws-emr-clusters aws-s3 big-data hadoop python spark pyspark ssh
Language:Jupyter Notebook
polarbeargo / Udacity-nd027-Data-Lake
aws-s3 aws-emr-clusters pyspark-python
Language:Python

aws-emr-clusters

RubensZimbres / Repo-2019

terraform-aws-modules / terraform-aws-emr

AWS-Big-Data-Projects / Run-a-Spark-job-within-Amazon-EMR

suvayu / emr-scripts

felipeazucares / Airflow-EMR-Redshift

khushal2405 / Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow

abhibalani / emr_lambda

anuragkr29 / TightCommunityDetection

dvu4 / udacity-data-engineering

nikhilsu / Product-review-analysis-Spark-MongoDB

rigganni / AWS-Spark-Million-Song-ETL

rshinde03 / Default-Credit-Data-Analysis-and-Prediction-Using-Big-Data

silviomori / covid19-datalake

Adith-Rai / Reddit-Stock-Sentiment-Analyzer

johnnyiller / cluster_funk

kacperstyslo / most-wanted-programming-skills-finder

m1theus / aws-emr-terraform

nihil21 / DocxAnonymizer-spark

sagardua297 / udacity-data-engineering-nd

SRVivek1 / pyspark-rdd-dataframe-examples

UCloudM / Steam_Analysis_For_Gamers

xianchen2 / Analyzing_10GB_of_Yelp_Reviews_Data

AleGuarnieri / Data-Lake-ETL

arjunsawhney1 / scalable-ML

Chan2k20 / Wine-Prediction-Prediction-Model-On-AWS-EMR

EricPaul075 / OCP8-Big-data-project-deployed-in-AWS-cloud

marcus-repo / etl-spark

mochan42 / Deploy-a-CNN-in-AWS-image-features-extraction-and-ACP

pavva94 / MovieSentimentAnalysis

Prajna-Bahuguna / EventBridge-SNS-Terraform

SagarFall2022 / BigData

tugberkcapraz / capstone_sparkify

AhmedDouaya / Deploiement_modele_cloud

im612 / P8_big_data

justinapnguyen / Big_Data_Wrangling_with_Google_Books_Ngrams

polarbeargo / Udacity-nd027-Data-Lake