There are 14 repositories under hadoop-hdfs topic.
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
More than 2000+ Data engineer interview questions.
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Big data projects implemented by Maniram yadav
Open source data infrastructure platform. Designed for developers, built for speed.
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
Ansible Playbook For Setup Hadoop HDFS
Helm chart for Apache Hadoop using multi-arch docker images
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
Toy Hadoop cluster combining various SQL-on-Hadoop variants
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
Data Engineering Project with Hadoop HDFS and Kafka
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.
Installation and configuration of Hadoop on Google Colaboratory
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
Neat and Handy Place for all Hadoop codes
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.
Python automation in linux
Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
Vehicle Fuel Hadoop MapReduce
In this task, we had to find the average length of comments given in the dataset. It was done using Hadoop MapReduce and Hadoop HDFS.
Bootcamp ministrado pela IGTI com o objetivo de abordar de forma intensiva conceitos e práticas da análise de dados, habilitando o aluno para atuar profissionalmente na área.
Repositório criado para armazenar anotações e atividades desempenhadas no treinamento na plataforma da Digital Inovattion One (DIO) para o Processo seletivo de Engenheiros de Dados pela empresa Everis.
An attempt to make a reliable, distributed file system inspired by HDFS
用于spring boot快捷使用HDFS的starter