Big Data Engineer
This repository was made to document and store my annotations on the Semantix Big Data Engineer training.
Summary
-
- Jupyter Notebook
- Spark session
- API Catalog
- Exercise: setup environment
- Exercise: setup Jupyter
- Reading a CSV
- RDD
- Schema Handling
- Datasets
- The
withColumn
command - Spark application
- Spark Streaming
- Spark Streaming with Kafka
- Structured Streaming
- Application Optimizations
- Structured Streaming with Kafka