There are 5 repositories under hadoop-docker topic.
Apache Hadoop docker image
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Dockerizing an Apache Spark Standalone Cluster
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
Run Hadoop Cluster within Docker Containers
We explore data by using Big Data Analysis and Visualization skills. To obtain this, we perform 3 main operations. i.e. i)Data Aggregation through different sources. ii) Big Data Analysis using MapReduce and iii) Visualization through Tableau. Data Analysis is very critical in understanding the data, and what we can do with the data. For small datasets it is easier to process and obtain the results. But as for big companies, it becomes crucial for them to obtain the trends of the company for any changes need to be made. Hence we introduce Big Data Analysis to solve this problem. In this lab, we collect close to 20000 tweets, 500 articles on New York Times and 500 articles on Common Crawl Data about Entertainment, which is our main topic of discussion. Using this data, we perform preprocessing and feed it to a MapReduce to find the Word Count and Word Co-Occurrence. Using this, we find the trend of the data collected in this topic. We have used Python to perform Data Analysis.Data Analysis is very critical in understanding the data, and what we can do with the data. For small datasets it is easier to process and obtain the results. But as for big companies, it becomes crucial for them to obtain the trends of the company for any changes need to be made. Hence we introduce Big Data Analysis to solve this problem. In this lab, we collect close to 20000 tweets, 500 articles on New York Times and 500 articles on Common Crawl Data about Entertainment, which is our main topic of discussion. Using this data, we perform preprocessing and feed it to a MapReduce to find the Word Count and Word Co-Occurrence. Using this, we find the trend of the data collected in this topic. We have used Python to perform Data Analysis.
A Spark/Hadoop-Docker Cluster template for working with Big Data
Hadoop in docker cluster, created by docker-compose. Create Hadoop cluster in less than 5mins.
Bigdata stack with Hadoop + Hive +Spark + Zeppelin + Hue + Superset
Hadoop deployment on docker and Docker Swarm
Run Apache Hadoop 2.7 inside docker container in pseudo-distributed mode
Apache Hadoop Cluster Docker images
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
Experiments with Hadoop cluster setups in Docker
Run Apache Hadoop 2.7 inside docker container in Multi-Node Cluster mode
Exercise files for Apache Hadoop Big Data Training
Apache Hadoop Docker Image
Data processing using docker containers, kafka, spark, and hadoop
This is an automated hadoop cluster building tool,which implements distributed computing for creating the cluster over the network. This is implemented in python 2.7
Построение рекомендательной системы на основе алгоритма коллаборативной фильтрации и технологии Hadoop Streaming
Hadoop-Cluster
Build Hadoop with Docker for Ubuntu. See releases for different architectures such as armv7l
Hadoop cluster on Docker (single host)
Compile hadoop in docker container
Apache Pig Latin script to count letters in multiple input text files, using the HortonWorks Hadoop Sandbox or Google Cloud Platform