IndustrialDataops / notebook-as-pipeline

Running jupyter notebooks as pipelines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Notebook as pipeline

Taking the advantage of papermill and testbook , we can use the existing jupyter notebooks to convert them into as pipelines and also perform unit testings without making any changes in notebook

pipeline

Pipeline

Below will be the folder structure of the project

│───.dockerignore
│───.gitignore
│───Dockerfile
│───Dockerfile.test
│───executor.py
│───parameters.yaml
│───pytest.ini
│───README.md
│───requirements.txt
│───runspec.yaml
│───test_requirements.txt
├───notebooks
│   │   Sample.ipynb
├───outputs
│   ├───2021-11-24
│   │       Sample_202111242301.ipynb
│   ├───2021-11-25
│   │       Sample_202111251600.ipynb
│   └───2021-11-27
│           Sample_202111270452.ipynb
└───tests
    │   sample_test.py
    │   __init__.py

The following are the components for the pipeline

  • notebooks/          -- containing the notebooks to run
  • parameters.yaml -- variables parameterized in notebooks will be stored in this and will be passed                                  during runtime
  • runspec.yaml       -- this will be used to store the order in which the jupyter notebooks should run                                   and also contains the run time configurations,
  • executor.py          -- python program which executes the notebooks by taking the inputs from runspec and parameters files
  • requirements.txt  -- python dependencies for workflow
  • Dockerfile             -- dockerfile to run pipeline

Papermill can take inputs from and send output notebooks to local, http , s3 , azure and gcp storage. In the demo we used local storage

Sample run

PS C:\work> docker run -v C:\work\notebook-as-pipeline:/pipelines -it nap:1.0 -s .\runspec.yaml -p .\parameters.yaml
Executing: 100%|████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00,  2.51cell/s]

Testing

Testbook adds the advantage of writing unit tests to notebooks without need to add any inline unit tests in notebooks.We used pytest to run unit tests

Sample run

PS C:\work> docker run -it nap_test:1.0
==================================== test session starts =======================================
platform linux -- Python 3.9.6, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /pipelines, configfile: pytest.ini
plugins: cov-3.0.0
collected 1 item

tests/sample_test.py .                   [100%] 

==================================== 1 passed in 2.52s ==========================================

TODO

About

Running jupyter notebooks as pipelines


Languages

Language:Jupyter Notebook 93.2%Language:Python 5.3%Language:Dockerfile 1.5%