payneal / Spark-The_Definitive_Guide

Notes from Spark: The definitive guide

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spark: The Definitive Guide

Part I. General Overview of Big Data and Spark

  1. What is Apache Spark
  2. A gentles Introduction to Spark
  3. A Tour of Sparks Toolset

Part II. Structured APIs - Dataframes, SQL, and Data Sets

  1. Structurf API overview
  2. Basic Structured Operations
  3. Working with different types of Data
  4. Aggregations
  5. Joins
  6. Data Sources
  7. SparkSQL
  8. Datasets

Part III. low-level apis

  1. Resilient Distributed Datasets (RDDs)
  2. Advanced RDDS
  3. Distributed Shared Variables

Part IV. Production Applications

  1. How Spark Runs on a Cluster
  2. Developing Spark Applications
  3. Deploying Spark
  4. Monitoring and Debuging
  5. Performance Tuning

Part V. Streaming

  1. Stream Processing Fundementals
  2. Structurf Straming Basics
  3. Event-Time and Stateful Production
  4. Structured Streaming In Production

Part VI. Advanced Analytics and Machine Learning

  1. Advanced Analytics and Machine Learning Overview
  2. Preprocessing and Feature Engineering
  3. Classification
  4. Regression
  5. Recommendation
  6. Unsupervised Learning
  7. graph Analytics
  8. Deep learningue

Part VII. EcoSystem

  1. Language Specifics: Python(Pyspark) and R (SparkRand sparklyr)

About

Notes from Spark: The definitive guide


Languages

Language:Python 100.0%