hadoop-hdfs

There are 14 repositories under hadoop-hdfs topic.

seaweedfs
seaweedfs / seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
distributed-storage distributed-systems s3 hdfs fuse distributed-file-system hadoop-hdfs posix tiered-file-system kubernetes replication object-storage s3-storage seaweedfs erasure-coding blob-storage cloud-drive
Language:Go 25793
OBenner / data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
data-engineering interview-questions interview hadoop hadoop-hdfs spark flink sql kafka hive impala airflow aws azure cassandra flume hbase avro nifi data-structures
1408
Morphl-AI / MorphL-Community-Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
artificial-intelligence machine-learning user-experience conversion-rate-optimization front-end-development data-driven-design product-development morphl-platform pyspark cassandra kubernetes hadoop-hdfs pipeline
Language:Python 261
linkedin / dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
hadoop hadoop-filesystem hdfs hdfs-dfs testing testing-tools scale scale-up performance-testing performance-test performance-analysis performance-metrics hadoop-framework hadoop-hdfs
Language:Java 131
Data-Engineering-Project-with-HDFS-and-Kafka
AhmetFurkanDEMIR / Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
data data-engineer data-engineering data-engineering-pipeline docker docker-compose hadoop hadoop-filesystem hadoop-hdfs hdfs hdfs-client hdfs-dfs kafka kafka-consumer kafka-producer kafka-ui pipline python python-hdfs-client kafkaui
Language:Python 117
big_data
groda / big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.
big-data bigdata spark spark-sql docker mapreduce mapreduce-bash pyspark hadoop testdfsio jupyter-notebook apache-sedona hadoop-cluster hadoop-hdfs mrjob gutenberg-ebooks hadoop-mapreduce apache-spark bigtop
Language:Jupyter Notebook 73
IBM / sparksql-for-hbase
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
hbase spark sql nosql hadoop-hdfs apache-spark ibmcode
69
vim89 / datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
apache-spark spark spark-sql python python3 pyspark etl etl-pipeline etl-framework etl-components xml xml-parsing datalake big-data hadoop hadoop-mapreduce hadoop-hdfs data-pipeline
Language:Python 55
maniram-yadav / Big_DataHadoop_Projects
Big data projects implemented by Maniram yadav
spark pig-latin pig hadoop hdfs sqoop hive mapreduce big-data-analytics big-data-projects hadoop-mapreduce hadoop-hdfs flume
Language:PigLatin 51
TravelWebsite_BigDataAnalysis
jarlor / TravelWebsite_BigDataAnalysis
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
bigdata coursework hadoop-hdfs java mapreduce
Language:Java 37
hokstack / hok-helm
HokStack - Run Hadoop Stack on Kubernetes
hadoop hdp kubernetes operator bigdata hadoop-cluster hadoop-hdfs automation dataops devops-tools
Language:Shell 24
torqbit / databox
Open source data infrastructure platform. Designed for developers, built for speed.
data-ops hadoop hadoop-hdfs kafka spark
Language:TypeScript 22
SepehrImanian / ansible-hadoop-hdfs
Ansible Playbook For Setup Hadoop HDFS
ansible ansible-playbook hadoop hadoop-hdfs hdfs
Language:Jinja 20
hadoop-sandbox / hadoop-sandbox
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
docker docker-compose hadoop hadoop-cluster hadoop-hdfs hadoop-yarn
Language:Shell 18
pfisterer / apache-hadoop-helm
Helm chart for Apache Hadoop using multi-arch docker images
hadoop hadoop-hdfs hadoop-mapreduce hadoop-filesystem helm-chart helm docker kubernetes
Language:Dockerfile 18
lucas91batista / twitter-hashtag-graph
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
twitter apache-flume hadoop hadoop-mapreduce hadoop-hdfs neo4j
Language:JavaScript 15
PChou / marayarn
Marathon on yarn
marathon yarn hadoop-hdfs
Language:Java 13
alagrede / HdfsClient
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
hadoop hadoop-hdfs kerberos kerberos-authentication
Language:Java 12
waltherg / distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
hadoop hadoop-mapreduce hadoop-filesystem hadoop-cluster hadoop-docker hadoop-hdfs hadoop-framework hive hue spark sparksql hbase hbase-client yarn yarn-hadoop-cluster zookeeper zookeeper-deployment tez impala presto
Language:Shell 12
Areesha-Tahir / Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
mapreduce parallel-computing parallel-programming sentiment-analysis code project java ubuntu hadoop-mapreduce hadoop-hdfs hadoop
Language:Java 11
jodth07 / hadoop-installation
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
hadoop hadoop-hdfs hadoop-ecosystem kafka installation scala sbt spark flume spark-installation hadoop-installation kafka-installation sbt-installation scala-installation
Language:Shell 8
leibniz21c / mammoth
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.
msa log-analyzer influxdb mongodb adobe-xd fastapi hadoop-hdfs yarn mapreduce flutter-app dart python3 docker docker-compose
Language:Dart 8
Mahmoud-nfz / football-big-data
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.
hadoop hadoop-hdfs kafka nextjs search-engine spark spark-streaming t3-stack rethinkdb
Language:TypeScript 8
LMAPcoder / Hadoop-on-Colab
Installation and configuration of Hadoop on Google Colaboratory
hadoop hadoop-hdfs hadoop-installation hadoop-mapreduce hadoop-streaming
Language:Jupyter Notebook 7
mgarralda / hadoop-spark-cluster
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
hadoop-hdfs spark-cluster spark-hadoop spark-hadoop-docker spark-yarn-docker spark
Language:Dockerfile 7
briandi26 / Machine-Learning-for-Forest-Fire-Prediction
Machine Learning for Forest Fire Prediction using Hadoop ecosystems and Spark Tools (Pyspark)
machine-learning forest-fire-model hadoop-hdfs spark pyspark
Language:Python 6
HxnDev / Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.
hadoop hadoop-mapreduce hadoop-hdfs mapreduce mapreduce-java hdfs parallel-computing parallel-programming sentiment-analysis sentimental-analysis java code
Language:Java 6
prabal03 / python-automation-in-linux
Python automation in linux
hadoop-hdfs docker webserver lvm aws linux
Language:Python 6
Ren294 / Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
airflow aws aws-ec2 aws-quicksight big-data covid19-data docker docker-compose hadoop-hdfs hdfs hive kafka nifi redpanda spark spark-sql spark-streaming big-data-analytics pipeline sparksql
Language:Shell 6
aadishgoel / Hadoop-Codes
Neat and Handy Place for all Hadoop codes
hadoop mapreduce-java javaapi hdfs wordcount hadoop-hdfs hadoop-mapreduce hadoop-filesystem
Language:Java 5
berksudan / Distributed-Environment-Installation-Guide
Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines
hadoop hadoop-hdfs installation-guides spark virtual-machines
5
hadoop-sandbox / hadoop-sandbox-images
Docker image builds for Hadoop sandbox.
docker hadoop hadoop-cluster hadoop-hdfs hadoop-yarn hdfs
Language:Dockerfile 5
HxnDev / Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
hadoop hadoop-mapreduce hadoop-hdfs hadoop-filesystem hadoop-cluster mapreduce mapreduce-java average-calculator code java
Language:Java 5
Ren294 / Log-Analysis-Project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
big-data data-engineering data-science apache-kafka apache-nifi apache-spark cassandra cassandra-driver grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming big-data-analytics
Language:Python 5
Reza-Marzban / Vehicle-Fuel-Hadoop-MapReduce
Vehicle Fuel Hadoop MapReduce
hadoop mapreduce mapreduce-java bigdata hadoop-mapreduce hadoop-hdfs
Language:Java 5
waikeungt / hdfs-spring-boot-starter
用于spring boot快捷使用HDFS的starter
hdfs spring-boot gradle java hadoop hadoop-hdfs
Language:Java 4

hadoop-hdfs

seaweedfs / seaweedfs

OBenner / data-engineering-interview-questions

Morphl-AI / MorphL-Community-Edition

linkedin / dynamometer

AhmetFurkanDEMIR / Data-Engineering-Project-with-HDFS-and-Kafka

groda / big_data

IBM / sparksql-for-hbase

vim89 / datapipelines-essentials-python

maniram-yadav / Big_DataHadoop_Projects

jarlor / TravelWebsite_BigDataAnalysis

hokstack / hok-helm

torqbit / databox

SepehrImanian / ansible-hadoop-hdfs

hadoop-sandbox / hadoop-sandbox

pfisterer / apache-hadoop-helm

lucas91batista / twitter-hashtag-graph

PChou / marayarn

alagrede / HdfsClient

waltherg / distributable_docker_sql_on_hadoop

Areesha-Tahir / Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords

jodth07 / hadoop-installation

leibniz21c / mammoth

Mahmoud-nfz / football-big-data

LMAPcoder / Hadoop-on-Colab

mgarralda / hadoop-spark-cluster

briandi26 / Machine-Learning-for-Forest-Fire-Prediction

HxnDev / Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword

prabal03 / python-automation-in-linux

Ren294 / Covid-Data-Process

aadishgoel / Hadoop-Codes

berksudan / Distributed-Environment-Installation-Guide

hadoop-sandbox / hadoop-sandbox-images

HxnDev / Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS

Ren294 / Log-Analysis-Project

Reza-Marzban / Vehicle-Fuel-Hadoop-MapReduce

waikeungt / hdfs-spring-boot-starter