melissakou / Notes-Big-Data-Essentials

A notes for Coursera Course: Big Data Essentials - HDFS, MapReduce and Spark RDD

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Notes_BigDataEssentials

This is a Note for Coursera course: Big Data Essentials - HDFS, MapReduce and Spark RDD.

Outline

latest update: 2021.01.11

Week Status Content
Week1 ✔️ Completed - Unix Command Line Interface (CLI)
- Distributed File Systems (DFS), HDFS (Hadoop DFS) Architecture and Scalability Problems
- Tuning Distributed Storage Platform with File Types
Week2 ✔️ Completed - Hadoop MapReduce: How to Build Reliable System from Unreliable Components
- Hadoop MapReduce Streaming Applications in Python
- Hadoop MapReduce Application Tuning: Job Configuration, Comparator, Combiner, Partitioner
Week3 📌 NotStarted - Hadoop MapReduce Application Tuning: Job Configuration, Comparator, Combiner, Partitioner
Week4 📌 NotStarted - Core concepts and abstractions
- Advanced topics
- Working with Spark in Python
Week5 📌 NotStarted - Working with Spark in Python
Week6 📌 NotStarted - Working with samples
- Telecommunications Analytics
- Working with social graphs

About

A notes for Coursera Course: Big Data Essentials - HDFS, MapReduce and Spark RDD