There are 6 repositories under hadoop-ecosystem topic.
A curated list of awesome System Design (A.K.A. Distributed Systems) resources.
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
Dockerfile for running Apache Knox (http://knox.apache.org/) in Docker
Analysis of YouTube Data using Hadoop Mapreduce framework in Java.
The goal of this project is to identify the flood-prone areas with probabilities of flood in counties in a future date, using Spark MLLib.
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
Helm chart for Apache Knox
Built a Large Scale Distributed Data Processing system for Streaming Analytics using Hadoop Ecosystem (Apache Spark and HDFS), in Cloud for real-time spatial analytics.
Big Data is Stored and analyzed of various Customer using Hadoop and other tools like Hive, Zookeeper, Hbase and sqoop and all details of the customer is analyzed then result are given.This result is very useful for companies.
This project focuses on analyzing movie data using Pyspark tailored for efficient data processing on Hadoop Distributed File System (HDFS)
Hadoop 生态体系(ecosystem)
Practise programs in hadoop ecosystem for refrence
Spark Streaming & Kafka Quick Start Tutorial
Mapreduce program developed in Java for analyzing movie dataset
Getting tweets using Flume service and analyzing tweets
[BigData] one year weblog analysis using PIG
Hadoop Projects
Some basic procedures for parallel computing in the Hadoop environment
Apache Hadoop Components Installation Guide on Windows
資料平行批次與串流處理以及搭建機器學習環境會用到的container
HDFS、MapReduce、Hive、Zookeeper原理以及实践操作
Processing and transforming data via Hadoop Ecosystem
Ambiente com o objetivo de praticar o uso das ferramentas Ansible e Hadoop usando uma única instância
This repository is going to update based on my challenges in installing and using the Hadoop's tools Spark
Learn and implement the Hadoop Ecosystem to drive Big Data Analytics.
[Work in progress] Client library for simplified access to Apache Accumulo