Large and complex data, difficult to process them using traditional data processing applications as it is computationally difficult to reveal patterns, trends, and associations, especially relating to human behavior and interactions. Apache Hadoop is an open-source software framework used for distributed storage and processing of dataset of big data using the MapReduce programming model.
- In Class Instruction: 4 Hours
- In Class code along Dataset: war_and_peace
- Installation of Hadoop
- Hands-on exercise with dataset
- Understand the motivation for Big Data
- Understand the storage layer underlying Big Data - HDFS
- Store and retrieve data in HDFS
- Big Data Motivation
- Introduction to Hadoop & Ecosystem
- Setup CLoudera environment
- Interaction with HDFS