Spark: The Definitive Guide
Part I. General Overview of Big Data and Spark
- What is Apache Spark
- A gentles Introduction to Spark
- A Tour of Sparks Toolset
Part II. Structured APIs - Dataframes, SQL, and Data Sets
- Structurf API overview
- Basic Structured Operations
- Working with different types of Data
- Aggregations
- Joins
- Data Sources
- SparkSQL
- Datasets
Part III. low-level apis
- Resilient Distributed Datasets (RDDs)
- Advanced RDDS
- Distributed Shared Variables
Part IV. Production Applications
- How Spark Runs on a Cluster
- Developing Spark Applications
- Deploying Spark
- Monitoring and Debuging
- Performance Tuning
Part V. Streaming
- Stream Processing Fundementals
- Structurf Straming Basics
- Event-Time and Stateful Production
- Structured Streaming In Production
Part VI. Advanced Analytics and Machine Learning
- Advanced Analytics and Machine Learning Overview
- Preprocessing and Feature Engineering
- Classification
- Regression
- Recommendation
- Unsupervised Learning
- graph Analytics
- Deep learningue
Part VII. EcoSystem
- Language Specifics: Python(Pyspark) and R (SparkRand sparklyr)