ricklamers / orchest-hello-spark

This repo shows how to run (Py)Spark in Orchest (locally)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Orchest: Hello Spark

Open in Orchest

This repo shows how to run (Py)Spark in Orchest (locally).

For details on how Spark is installed check out setup_script.sh. The actual Spark code is a minimal example of how to count words in a Python LICENSE text file. Checkout the notebook with code.

To connect to a cluster instead use a different PySpark context initializer:

conf = pyspark.SparkConf()
conf.setMaster('spark://head_node:7077')
conf.set('spark.authenticate', True)
conf.set('spark.authenticate.secret', 'secret-key')
sc = pyspark.SparkContext(conf=conf)

Pipeline

PySpark pipeline

About

This repo shows how to run (Py)Spark in Orchest (locally)


Languages

Language:Jupyter Notebook 52.1%Language:Shell 47.9%