BigData Ecosystem Architecture

Internal working of Bigdata and it's ecosystems such as

** Note: Refer the links metioned below under each ecosystem for detailed explanation **

The various underlying process that takes place during the storage of a file into HDFS such as:

Type of scheduler
Block & Rack information
File size
File location
Replication information about the file(Over-replicated blocks, Under-replicated blocks, ...)
Health status of the file

Please click on the link below to know the execution and flow process

Used to perform 2 main operations.

Sqoop Import:
- To ingest data from any source such as traditional databases into hadoop file system HDFS
Sqoop Export:
- To export data from hadoop file system HDFS to any traditional databases

To support the above two operations internally a CodeGen is used.

Sqoop CodeGen:
- To compile metadata and other relative information into java class file & create a Jar

Please click on the link below to know the execution and flow process

It has mainly 4 components

Please click on the link below to know the execution and flow process

The various phases involved before and during the execution of a spark job.

Spark Context
- It is the heart of spark application.
Yarn Resource Manager, Application Master & launching of executors (containers).
Setting up environment variables, job resources.
CoarseGrainedExecutorBackend & Netty-based RPC.
SparkListeners.
- LiveListenerBus
- StatsReportListener
- EventLoggingListener
Execution of a job
- Logical Plan (Lineage)
- Physical Plan (DAG)
Spark-WebUI.

Please click on the link below to know the execution and flow process

It has 3 different variants as part of it.

RDD (Resilient Distributed Datasets)
- Lineage Graph
- DAG Scheduler
DataFrames
- Catalyst Optimizer
- Tungsten Engine
- Default source or Base relation
Datasets
- Optimized Tungsten Engine - V2
- Whole Stage Code Generation

Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.

MIT License

Language:Shell 70.0%Language:TSQL 15.5%Language:Scala 14.6%