apache-hadoop

There are 5 repositories under apache-hadoop topic.

mahmoudparsian / data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
hadoop-mapreduce java distributed-computing scala mapreduce data-algorithms python machine-learning pyspark distributed-algorithms mappers reducers apache-hadoop apache-spark design-patterns partitioning
Language:Java 1062
mahmoudparsian / big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
pyspark-algorithms-book mapreduce santa-clara-university pyspark data-algorithms data-transformation data-partition partitioning-algorithms algorithms mapreduce-python mapreduce-algorithm apache-hadoop apache-spark big-data data-analysis data-engineering glossary monoid spark-dataframes spark-rdd
Language:HTML 150
tencentyun / hadoop-cos
hadoop-cos（CosN文件系统）为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持，可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
apache-hadoop alluxio hadoop-compatible-filsystem tencent-cloud-cos
Language:Java 79
s911415 / apache-hadoop-3.1.0-winutils
HADOOP 3.1.0 winutils
hadoop winutils apache-hadoop native
Language:Batchfile 71
PBWebMedia / yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
yarn hadoop resource-manager prometheus exporter metrics apache-hadoop apache yarn-hadoop-cluster
Language:Go 47
realtimedatalake / hive-metastore-docker
Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments
apache-hive apache-hive-metastore apache-hadoop big-data docker docker-compose postgresql rtdl open-source
Language:Dockerfile 9
Guru107 / hadoop-small-files-merger
A Spark application to merge small files on Hadoop
apache-spark apache-hadoop scala avro parquet text
Language:Scala 8
spark-minimal-algorithms
kowaalczyk / spark-minimal-algorithms
An python implementation of Minimal Mapreduce Algorithms for Apache Spark
algorithms apache-hadoop apache-spark hadoop-mapreduce minimal-algorithms pyspark python python3 spark
Language:Python 6
Coursal / Hadoop-Examples
Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.
hadoop hadoop-mapreduce apache-hadoop examples java hadoop-example mapreduce mapreduce-java
Language:Java 5
mohammadtavakoli78 / Cloud-Computing
This is projects of Cloud Computing Course
apache-hadoop cloud-computing cloud-services docker docker-compose hadoop hdfs helm helm-chart helm-charts kubernetes statefulset statefulsets yarn
Language:Python 5
RBC-DSAI-IITM / DCEIL
A fast, scalable and distributed community detection algorithm based on CEIL scoring function.
community-detection apache-spark apache-hadoop
Language:Scala 5
haodemon / HadoopStreaming
Set of Input Formats for Hadoop Streaming
hadoop apache-hadoop inputformat
Language:Java 4
nghoanglong / spark-cluster-with-docker
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
apache-spark apache-hadoop docker
Language:Shell 4
bdoepf / aws-emr-prometheus
aws prometheus emr emr-cluster apache-spark apache-flink apache-hadoop
Language:HCL 3
chriskery / hadoop-operator
Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.
hadoop kubernetes apache-hadoop kubernetes-operator hadoop-cluster k8s
Language:Go 3
felidsche / mail-spam-filter
An email spam filter using Apache Spark’s ML library
apache-spark spark-ml apache-hadoop
Language:Python 3
jagdish4501 / Network-intrusion-Detection
This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.
numpy pandas apache-hadoop libcap npcap scapy scikit-learn matplotlib
Language:Jupyter Notebook 3
Jordan396 / Giraph-1.2.0-Installation
Instructions for Installing Giraph-1.2.0
apache-hadoop giraph ubuntu1804 virtual-machine google-cloud
3
whoami-anoint / EasyHadoop
Simplified Hadoop Setup and Configuration Automation
apache-hadoop big-data big-data-analytics big-data-essentials big-data-projects data-science ec2-instance hdfs hdfs-cluster
Language:Shell 3
Abdelhakim-gh / BigData_Project
This project aims to establish a data streaming pipeline with storage, processing, and visualization
apache-flink apache-hadoop apache-kafka elasticsearch github-api kibana python
Language:Python 2
saitejavishalj / Hotspot-analysis-of-Geospatial-data
Built a Large Scale Distributed Data Processing system for Streaming Analytics using Hadoop Ecosystem (Apache Spark and HDFS), in Cloud for real-time spatial analytics.
hdfs sparksql apache-spark apache-hadoop hadoop-ecosystem data-analysis distributed-systems large-scale streaming-analytics
Language:Scala 2
surbhitawasthi / MiniProject-AadharCensusDataValidation
A small code to validate the Census data on the basis of Aadhar Data
apache-hadoop mapred hdfs java
Language:Java 2
yingzhuo / logback-flume-appender
logback appender for apache-flume
apache-hadoop apache-flume apache-hive logback logback-appender slf4j logback-flume-appender flume
Language:Java 2
aaqib-ahmed-nazir / BDA_Assignment02
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
apache-hadoop hadoop jupyter-notebook mapreduce python search-algorithm search-engine jupiter-notebook mapreduce-python python3
Language:Jupyter Notebook 1
Abdelhakim-gh / Spark_MinProject
The goal of this project is to learn data processing using Spark with practical examples on datasets and also apply programming with Scala.
apache-hadoop apache-hive apache-spark scala
Language:HTML 1
carlosemsantana / docker-hadoop
Preparação de um ambiente de desenvolvimento e testes para Apache Hadoop.
apache-hadoop docker docker-hadoop documentation
1
Coursal / Text-Sentiment-Analysis-In-Hadoop-And-Spark
The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.
text-mining text-classification naive-bayes-classifier naive-bayes naive-bayes-classification support-vector-machines support-vector-machine-svm mapreduce hadoop hadoop-mapreduce spark spark-mllib sentiment-analysis sentiment-classification opinion-mining apache-hadoop apache-spark
Language:Java 1
esakik / data-engineering-essentials
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
apache-beam apache-spark apache-airflow apache-hadoop fluentd embulk digdag mrjob cloud-dataproc cloud-dataflow amazon-emr protocol-buffers apache-avro data-engineering
Language:Python 1
FlightAnalysis
Lucass97 / FlightAnalysis
This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.
influxdb python roma-tre-university cassandra docker jupyter-notebook kafka portainer spark spark-streaming apache-hadoop apache-mllib apache-spark bigdata flights-data flights-delays hadoop-hdfs spark-mllib spark-sql lambda-architecture
Language:Jupyter Notebook 1
Narius2030 / Sakila-Business-Analysis
Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems
apache-hadoop apache-hive data-analysis hiveql etl-pipeline machine-learning statistics
Language:Jupyter Notebook 1
TrentBrunson / TrentBrunson.github.io
My portfolio | under development
analytics big-data machine-learning python pytorch apache-hadoop aws deep-learning django flask google-colab nlp pyspark spark tensorflow webscraping yarn
Language:HTML 1
Trisha11r / covid_data_analysis_mapreduce
COVID-19 data analysis with MapReduce
hadoop-mapreduce hadoop-hdfs covid-2019 java-8 data-analysis database-management apache-hadoop
Language:Java 1
tspannhw / links
Links
scala apache-spark sbt apache-hadoop
Language:Scala 1
dmarks84 / Coursework_Capstone_Full_Data_Engineering
Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification
apache-airflow apache-hadoop apache-kafka apache-spark api beautifulsoup cassandra dags etl mongodb nosql pandas plotly postgresql python scipy seaborn sql
Language:Jupyter Notebook 0
rachmanz / WSL2DW
Intalasi WSL2 untuk Praktikum ABD
apache-hadoop apache-hive derby-database
0
SomeshChevella / Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset
In this project we will use Hadoop MapReduce to implement a very basic “Sentiment Analysis” using the review text in the Yelp Academic Dataset as training data.
apache-hadoop mapreduce-java
Language:Java

apache-hadoop

mahmoudparsian / data-algorithms-book

mahmoudparsian / big-data-mapreduce-course

tencentyun / hadoop-cos

s911415 / apache-hadoop-3.1.0-winutils

PBWebMedia / yarn-prometheus-exporter

realtimedatalake / hive-metastore-docker

Guru107 / hadoop-small-files-merger

kowaalczyk / spark-minimal-algorithms

Coursal / Hadoop-Examples

mohammadtavakoli78 / Cloud-Computing

RBC-DSAI-IITM / DCEIL

haodemon / HadoopStreaming

nghoanglong / spark-cluster-with-docker

bdoepf / aws-emr-prometheus

chriskery / hadoop-operator

felidsche / mail-spam-filter

jagdish4501 / Network-intrusion-Detection

Jordan396 / Giraph-1.2.0-Installation

whoami-anoint / EasyHadoop

Abdelhakim-gh / BigData_Project

saitejavishalj / Hotspot-analysis-of-Geospatial-data

surbhitawasthi / MiniProject-AadharCensusDataValidation

yingzhuo / logback-flume-appender

aaqib-ahmed-nazir / BDA_Assignment02

Abdelhakim-gh / Spark_MinProject

carlosemsantana / docker-hadoop

Coursal / Text-Sentiment-Analysis-In-Hadoop-And-Spark

esakik / data-engineering-essentials

Lucass97 / FlightAnalysis

Narius2030 / Sakila-Business-Analysis

TrentBrunson / TrentBrunson.github.io

Trisha11r / covid_data_analysis_mapreduce

tspannhw / links

dmarks84 / Coursework_Capstone_Full_Data_Engineering

rachmanz / WSL2DW

SomeshChevella / Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset