There are 1 repository under spark-clusters topic.
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Command line interface for spark cluster management app
This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.
:notebook: Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
This project create an Hadoop and Spark cluster on Amazon AWS with Terraform
A python library to submit spark job in yarn cluster at different distributions (Currently CDH, HDP)
A collection of scripts to easily start HDFS and Spark clusters
Performing various product review analysis on Amazon dataset using Apache Spark and MongoDB
Spark on Kubernetes PoCs
Template for Spark Data Science Projects
Docker image to deploy a spark cluster in containers
Research to setup and use a Spark Standalone Multi-Node Cluster.
Stuff done on AWS. Gathered the steps of creating spark cluster on EC2.
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
spark-clusters management with docker