cliff1987 / elyra-aidevsecops-tutorial

AIDevSecOps with Thoth and Elyra

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Elyra AIDevSecOps Tutorial

This tutorial is used to discuss the interface between Data Science and Dev/DevOps using project templates, pipelines and bots. Moreover, it wants to highlight that Data Scientists are not so different from developers and DevSecOps practices and tools can be applied to MLOps ones.

The demo application used is the "hello world" for AI: MNIST Classification.

Environment required

This tutorial has the following environment requirements to be run:

  • Open Data Hub v1.0,
  • Openshift (Enterprise Kubernetes),
  • Cloud Object Storage (e.g. Ceph, minio).
  • Tekton, used in CI/CD systems, to run pipelines created by humans or machines.
  • ArgoCD, used for Continuous Deployment of your applications.
  • Tutorial container image:
jupyterhub==1.2.1
jupyterlab>=3.*
elyra>=2*
jupyterlab-requiremnts>=0.4.5

Operate First Open Environment

Operate First Open Infrastructure environment has been selected to run this tutorial. It fullfills all the requirement stated above. If you are interested in using it, just get in touch with Operate First team, it's an open source initiative.

You can find some notes also regarding other environments in the different sections of the tutorial.

Tools

In this tutorial the following technologies are going to be used:

  • JupyterHub, to launch images with Jupyter tooling.
  • Elyra, which is a set of AI-centric extensions to JupyterLab Notebooks (e.g. interface with Kubeflow Pipeline, Git, Python scripts).
  • Project Thoth Extension for Dependency Management on JupyterLab. If you want to know more, have a look at this repo.
  • Kubeflow Pipelines, to allow end to end experiment using pipelines.

GitOps, reproducibility, portability and traceability with AI support

Nowadays, developers (including Data Scientists) use Git and GitOps practices to store, share all code (e.g. notebooks, models) on development platforms (e.g. GitHub). GitOps best practices help reproducibility and traceability for all projects available.

One of the most important requirement for reproducibility is dependencies management. Having dependencies clearly stated allow for reusability and portability of notebooks, which can be reused in another projects and shared safely with other Data Scientists.

(WIP) If you want to know more about this issue in the data science domain, have a look at this article.

Project Thoth helps developers keep these dependencies up to date integrating its recommendation in developers daily tools. If you want to know more have a look here. Thanks to this tooling, the developers (including data scientists) do not have to worry too much about dependency management (they still need to select their dependencies), which can be handled by a bot and automated pipelines.Hence, having AI support can lead to improvement in the development of AI project, speeding up steps due to performance improvements coming from dependencies and keeping your application secure because insecure libs cannot be introduced.

Automated pipelines and bots for your GitHub project

  • Kebechet Bot to keep your dependencies fresh and up to date receiving recommendations and justifications using AI.

  • AICoE Pipeline to support your AI project lifecycle.

All these tools are integrated with the project-template, therefore most of the things are already set for you. One important task in order to mantain your code is to create tag on your project development lifecycle. Moreover in order to deploy your application you need to create a container image. The use of github templates integrated with bots can provide you with automated pipelines triggered depending on what you need (e.g. release (patch, minor, major), deliver container image, dependency updates).

Project templates

The project template used can be found here: project-template. It shows correlation between data scientists (e.g. data, notebooks, models) requirements and AI dev ops engineers ones (e.g. manifests). Using a project template allows for shareability because anyone taking the project or look for something specific about the project can immediately identify all the resources needed.

Tutorial Steps

  1. Pre-requisities

  2. Setup your initial environment

  3. Explore notebooks and manage dependencies

  4. Push changes to GitHub

  5. Create release, build image or overlays builds for different images

4.1 Benefit from bots to keep your dependencies fresh and up to date

  1. Create an AI Pipeline

  2. Run and debug AI Pipeline

  3. Deploy Inference Application

  4. Test Deployed inference application

  5. Monitor your inference application deployed

NOTE: Each of the steps above can be repetaed if you are following ML lifecycle (e.g. changes in the dependencies, changes in the notebooks, new model stored).

References

About

AIDevSecOps with Thoth and Elyra

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 88.2%Language:Python 9.1%Language:Makefile 2.8%