There are 9 repositories under hadoop-cluster topic.
Apache Hadoop docker image
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
ansible playbook to deploy cloudera hadoop components to the cluster
Dockerizing an Apache Spark Standalone Cluster
A System is designed to analyse BigData collect from Wifi probe
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
Apache Ignite Guide
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
Taller del Máster Profesional de Informática UGR. Curso de CloudComputing.
Run Hadoop Cluster within Docker Containers
Dockerfile for running Apache Knox (http://knox.apache.org/) in Docker
Docker image builds for Hadoop sandbox.
Colelction of various clustering algorithms including K means, HAC, DBscan. Also includes Hadoop, MapReduce, implementation of K mean algorithm
A storage reference to a comprehensive guide on installing Hadoop on Windows
This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
Hadoop in docker cluster, created by docker-compose. Create Hadoop cluster in less than 5mins.
Analyses the customer logs for bigdata components like HDFS, Hive, HBase, Yarn, MapReduce, Storm, Spark, Spark 2, Knox, Ambari Metrics, Nifi, Accumulo, Kafka, Flume, Oozie, Falcon, Atlas & Zookeeper.
The project deals on how to perform Spatio-temporal hot-spot analysis using Apache Spark.
Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.
Design, build, and execute effective big data strategies with advanced Hadoop concepts
Movie rating prediction application
BigData Cluster with Docker
A repository for some scripts that can help in creating a distributed Big data ecosystem using the platform Grid5000.
This project create an Hadoop and Spark cluster on Amazon AWS with Terraform
Containerized Hadoop cluster with Spark, Hive, Pig, HBase, and Zookeeper for scalable Big Data processing using Docker.
deploy bigdata platform on kubernetes