DataPulse: Platform For Big Data & AI

Features

Spark Application Deployment
- Jar Application Submission
- PySpark Application Submission
- Jupyter Notebook
  - Customized Integration with PySpark
Monitoring
- Spark UI
- History Server

cp bin/env_template.yaml bin/env.yaml

Fill in the env.yaml file with your own configurations.

source bin/setup.sh

A service notebook will be created on the Kubernetes cluster.

Check Spark information by running the following code in a notebook cell:

start()

Check Spark UI by clicking the link in the notebook cell output.

This project is licensed under the terms of the MIT license.

DataPulse is a platform for developers to build, schedule and monitor data pipelines.

MIT License

Language:Shell 74.0%Language:Python 23.6%Language:Dockerfile 2.5%