databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Home Page:https://learning.oreilly.com/library/view/learning-spark-2nd/9781492050032/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning Spark 2nd Edition

Welcome to the GitHub repo for Learning Spark 2nd Edition.

Chapters 2, 3, 6, and 7 contain stand-alone Spark applications. You can build all the JAR files for each chapter by running the Python script: python build_jars.py. Or you can cd to the chapter directory and build jars as specified in each README. Also, include $SPARK_HOME/bin in $PATH so that you don't have to prefix SPARK_HOME/bin/spark-submit for these standalone applications.

For all the other chapters, we have provided notebooks in the notebooks folder. We have also included notebook equivalents for a few of the stand-alone Spark applications in the aforementioned chapters.

Have Fun, Cheers!

About

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

https://learning.oreilly.com/library/view/learning-spark-2nd/9781492050032/

License:Apache License 2.0


Languages

Language:Scala 49.7%Language:Python 30.7%Language:Java 19.6%