wagnerdevocelot / rtjvm_spark_essentials

Rock The JVM - Apache Spark Essentials

Home Page:https://rockthejvm.com/p/spark-essentials

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rock The JVM - Apache Spark Essentials

Master Spark's core APIs with Scala.

Certificate

Certificate of Completion

Sections

  1. Scala Recap
  2. DataFrames
  3. Types and Datasets
  4. Spark SQL
  5. Low-Level API and RDDs
  6. Clusters
  7. Big Data

IntelliJ IDEA

Docker

Postgres Database Container

$ docker compose up

In another shell:

$ ./psql.sh

Spark Cluster Container

Build spark images for master, worker and submit (do this once):

$ cd spark-cluster
$ ./build-images.sh

Start a Spark cluster with 1 worker:

$ docker compose up --scale spark-worker=1

Connect to the master node and run the Spark SQL shell:

$ docker exec -it spark-cluster-spark-master-1 bash
$ cd spark/
$ ./bin/spark-sql

Or a run a Spark shell:

$ ./bin/spark-shell

This starts a web view listening on localhost:4040

Certificate of Completion

Among other tools, we can also run Spark with R and Python environments:

$ /spark/bin/sparkR
$ /spark/bin/pyspark
$ /spark/bin/spark-submit
$ /spark/bin/beeline
$ /spark/bin/find-spark-home
$ /spark/bin/spark-class
$ /spark/bin/spark-class2.cmd
$ /spark/bin/run-example

About

Rock The JVM - Apache Spark Essentials

https://rockthejvm.com/p/spark-essentials

License:MIT License


Languages

Language:Scala 96.1%Language:Dockerfile 2.3%Language:Shell 1.6%