Hadoop with HDFS, YARN, MapReduce, Pig, Hive, Spark, Flink and more...
- This repository contains all the data, setup and execution scripts used during the coursework on Udemy - The Ultimate Hands on hadoop Tame your Big data
- Requires Hortonworks Sandbox 2.6.5, for docker script to run locally look into this repo.
- If you need terraform script to setup an sandbox environment in AWS, look into this repo.
- Introduction and ml-100k dataset
- MR using Python
- PigLatin and Pig
- Spark
- Hive
- HBase, MongoDB & Cassandra
- Drill, Phoenix, Presto
- Yarn, Tez, Mesos, Zookeeper, Oozie, Hue
- Real time Ingestion - kafka and flume
- Spark Streaming, Apache Storm & Apache Flink
- Impala, Apache Flume
All the technologies used throughout the course
- Introduction to the Hadoop Eco-system
- YARN
- HDFS
- MapReduce
- Pig
- Spark
- Hive
- Tez
- Mesos - Alternative cluster manager for
YARN
- Zookeeper
- Oozie
- Apache Drill
- Apache Phoenix
- Presto
- Sqoop
- Kafka
- Flume
- HBase
- Cassandra
- MongoDB
- Spark Streaming
- Storm
- Flink
- Apache Zeppelin
- Apache Superset
- Impala
- Accumulo
- Redis
- Ignite
- Elasticsearch
- Kinesis
- Apache NiFi
- Falcon
- Apache Slider
MIT © Murshid Azher.