Kubeflow Spark

Orchestrate Spark Jobs using Kubeflow, a modern Machine Learning orchestration framework. Read related blog post.

Requirements

Run make all to start everything and skip to step 6 or:

./scripts/start-minikube.sh

./scripts/install-kubeflow.sh

./scripts/install-spark-operator.sh

./scripts/add-spark-rbac.sh

./scripts/add-kubeflow-ui-ingress.sh

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8005:80

python kubeflow_pipeline.py

Navigate to the Pipelines UI and upload the newly created pipeline from file spark_job_pipeline.yaml
Trigger a pipeline run. Make sure to set spark-sa as Service Account for the execution.
Enjoy your orchestrated Spark job execution!

Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.

Apache License 2.0

Language:Python 62.1%Language:Shell 29.8%Language:Makefile 8.1%