The Goal of this project is to create a data pipeline which creates a labeled dataset, using which we train a ML Model Pipeline and Deploy a Flask App on a Kubernetes Cluster. Everything is Managed On the Cloud.
This Project has 4 Stages
- Annotation Pipeline
- This is the starting point for the main pipeline.
- It Generates a Database of A Labeled Dataset using Azure Text Analytics API
- Entire Database is stored in a AWS S3 bucket
- Machine Learning Pipeline
- This is the second pipeline.
- The Database created in the Annotation Pipeline is used to train our model
- The trained model is stored on a S3 bucket
- REST Flask App
- The trained model is incubated in a Python Flask REST App
- The Flask App is tested inside a Docker Container
- The Docker Container is Deployed on a Google Cloud Kubernetes Engine
- Inference Pipeline
- Inference Pipeline is an Automated Sentiment Analysis Pipeline
- It scrapes EDGAR Earning Call Transcript Data and stores it in the cloud
- Using the Flask Webapp in Stage 3, It predicts the sentiment of the document.
These instructions will get you a copy of the project up and running on your Local Environment using Cloud Infrastructure
git clone www.github.com/Dhruv-Panchal/ml-as-a-service-pipeline
Python3.7
AWS Account
GCP Account
Microsoft Azure Account
What things you need to install the software:
pip3 install -r requirements.txt
- Create Multiple AWS S3 Buckets
- Configure IAM Role having Full S3 Bucket Access in your local environment. Learn More Here
- Create a GCP Account. Get Started Here
- Create an Azure Account. Get Started Here
- Request a Metaflow Sandbox to run your pipeline on AWS Batch.
- Once Everything is setup, Configure Metaflow's Sandbox. Run
metaflow configure sandbox
on CLI. Enter The API Keys from Step 1 - Configure the input/output buckets on AWS S3 and Enter the bucket name in Annotation Pipeline , ML Pipeline , Inference Pipeline and Flask App
- Lastly, add the Azure Api Keys Here
Run on CLI
- Change the permission of the files
chmod a+x Annotation\ Pipeline/index.py ML\ Pipeline/index.py Inference\ Pipeline/index.py
- Running the Annotation Pipeline
./Annotation\ Pipeline/index.py run --with sandbox
- Running the Machine Learning Pipeline
./ML\ Pipeline/index.py run --with sandbox
- Creating a docker container of the flask app
cd REST\ Flask\ App/
docker build .
docker login --username=yourhubusername --email=youremail@company.com
docker push yourhubusername/reponame
Once the Dockerized Flask App is in the repo in Step 3,
Create a Kubernetes Cluster on Google Cloud Product and Deploy your Docker File From Hub. Learn More Here
Now Your Flask App Is Up! and Accessible from Anywhere Across The World!
Add the required Tickerfile bucket location in Inference Pipeline
Add Bucket Location Inference Pipeline
Add the IP Address and Port Number Obtained from The GCP Kubernetes Cluster in Inference Pipeline
- MetaFlow - Data Pipeline Framework
- TensorFlow - Machine Learning Model
- Docker - Container Environment
- Flask - Web Framework
- AWS Batch - Cloud Infrastructure for Big Data Pipeline
- Azure Text Analytics API - NLP Text Analytics API
- Google Cloud Engine - Kubernetes Cluster Engine
- Dhruv Panchal - Research and Development - Linkedin
- Kashish Shah - Design, Architect and Deployment - Linkedin
- Manogana Mantripragada - Machine Learning Engineer - Linkedin
This project is licensed under the Commons Clause License - see the LICENSE.md file for details