There are 4 repositories under hadoop-framework topic.
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.
This repository contains a simple Hadoop-like (MapReduce) distributed computing platform implemented in Java. It is extended from a course project at UIUC awarded the best Java version implementation and it's open-sourced for reference.
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Code samples, summaries, cheatsheets and other study material for Hadoop MapReduce and Apache Spark
零基础大数据ĺ¦äą 笔记
A storage reference to a comprehensive guide on installing Hadoop on Windows
Twitter data analysis using hadoop (hdfs), flume, map-reduce and hive. Sentiment Analysis is also done using affin dictionary for tweets related to Indian election.
The goal of this project is to identify the flood-prone areas with probabilities of flood in counties in a future date, using Spark MLLib.
The repo contains the steps for setting up the single node cluster in Hadoop 3.2.1 in Ubuntu 20.04 LTS
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
PageRank algorithm written in Java MapReduce framework
Product recommendation system on Amazon product dataset using Apache Spark framework
MapReduce Python Example
Python Scripts for working with Big Data Files
A basic introductory example of hadoops mapreduce libraries to load and analyse large datasets in this case a US patent dataset sourced from https://www.nber.org/research/data/us-patents
Hadoop-Cluster
Distributed Hadoop and Spark based framework for in-memory GIS queries
MapReduce in Cluster.
An Ansible Role to Configure and setup Hive Data WareHouse on Client Node.
Titanic data analysis with Hadoop