There are 13 repositories under hadoop-mapreduce topic.
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Big data projects implemented by Maniram yadav
K-Means algorithm implementation with Hadoop and Spark for the course of Cloud Computing of the MSc AIDE at the University of Pisa.
A collection of mapreduce problems and solutions
中文文本挖掘|舆情分析|Hadoop|Java|MapReduce
Projects done in the Cloud Computing course.
Source code for the examples in the book Cloud Computing Solutions Architect: A Hands-On Approach by Arshdeep Bahga and Vijay Madisetti
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
Data Engineering Course
Search Engine projects
2021 Spring (Distributed Computing Systems) 分布式系统与编程
I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.
Helm chart for Apache Hadoop using multi-arch docker images
Student projects in Big Data field.
This repository contains a simple Hadoop-like (MapReduce) distributed computing platform implemented in Java. It is extended from a course project at UIUC awarded the best Java version implementation and it's open-sourced for reference.
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
Computing pagerank with Hadoop MapReduce
A Genetic Algorithms framework for Hadoop MapReduce.
Code samples, summaries, cheatsheets and other study material for Hadoop MapReduce and Apache Spark
These are a select few projects related to Big Data Analytics and Management. The projects listed are a combination of both small and big projects but interesting ones.
My Practice and project on PySpark
Easy parallel map-reduce command line tool
Hadoop MapReduce program to compute multiplication of two sparse matrices
Simulates the data transfer to explore caching potential in network nodes running Hadoop over NDN (Named Data Networking) rather than traditional TCP/IP.
Monitor your oozie server and your oozie bundles with graphite