commit-live-students / big_data_hadoop_in_class

Big Data and Hadoop

Large and complex data, difficult to process them using traditional data processing applications as it is computationally difficult to reveal patterns, trends, and associations, especially relating to human behavior and interactions. Apache Hadoop is an open-source software framework used for distributed storage and processing of dataset of big data using the MapReduce programming model.

At a glance

In Class Instruction: 4 Hours
- In Class code along Dataset: war_and_peace

In Class Activity

Installation of Hadoop
Hands-on exercise with dataset

Pre Reads

Learning Objectives

Understand the motivation for Big Data
Understand the storage layer underlying Big Data - HDFS
Store and retrieve data in HDFS

Agenda

Big Data Motivation
Introduction to Hadoop & Ecosystem
Setup CLoudera environment
Interaction with HDFS

Slides

Big Data and Hadoop Introduction

Post Reads

About