azmd801 / back_order_prediction_end2end

This repo contains an end-to-end solution for predicting product back-orders using machine learning. It includes data preprocessing, model training, and evaluation tools, designed to help businesses mitigate supply chain disruptions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


This repo contains an end-to-end solution for predicting product back-orders using machine learning. It includes data preprocessing, model training, and evaluation tools, designed to help businesses mitigate supply chain disruptions

Problem Statement

Backorders are a pervasive issue in the retail and supply chain sectors, causing unexpected strains on production, logistics, and transportation. The analysis aims to construct a predictive model that can forecast backorders based on past data from inventories, supply chain, and sales, thereby aiding in streamlined planning and prevention of unforeseen operational strains.

This is a Binary Classification problem,the goal is to classify products as going into backorder (Yes or No), utilizing the vast amount of data generated by ERP systems, which contains a wealth of historical data that can be harnessed for this purpose.

Solution Proposed

The project involves the development of a machine learning model to predict the likelihood of products going into backorder. Historical data that encompasses inventory levels, supply chain dynamics, sales patterns, and pertinent factors will be utilized.

This is a Binary Classification Problem hence consists of two target values :

  • Yes: Represents that product will go to backorder.
  • No: Represents that product will not go to backorder

For this problem Accurately predicting both classes is essential.False positives can result in excess costs and resource allocation. On the other hand, false negatives—products can lead to missed opportunities and customer dissatisfaction

Tech Stack Used

  1. Python
  2. FastAPI
  3. Machine learning algorithms
  4. Docker
  5. MongoDB

Infrastructure Required.

  1. AWS S3
  2. AWS EC2
  3. AWS ECR
  4. Git Actions

How to run?

Before we run the project, make sure that you are having MongoDB in your local system, with Compass since we are using MongoDB for data storage. You also need AWS account to access the service like S3, ECR and EC2 instances.

To set AWS services refer this link:

Step 1: Clone the repository

git clone

Step 2- Create a conda environment after opening the repository

conda create -n sensor python=3.7.6 -y
conda activate sensor

Step 3 - Install the requirements

pip install -r requirements.txt

Step 4 - Export the environment variable




export MONGODB_URL="mongodb+srv://<username>:<password>"

Step 5 - Run the application server


Step 6. Train application


Step 7. Prediction application


Run locally

  1. Check if the Dockerfile is available in the project directory

  2. Build the Docker image

docker build -t sensor . 

  1. Run the Docker image
docker run -d -e AWS_ACCESS_KEY_ID="${{ secrets.AWS_ACCESS_KEY_ID }}" -e AWS_SECRET_ACCESS_KEY="${{ secrets.AWS_SECRET_ACCESS_KEY }}" -e AWS_DEFAULT_REGION="${{ secrets.AWS_DEFAULT_REGION }}" -e MONGODB_URL="${{ secrets.MONGODB_URL }}" -p 8080:8080 sensor

Run on cloud

First you need to set AWS EC2 maching and create ECR repository. Refer this link for performing the setup

Launch EC2 CLI

Follow these steps to access and connect to your running instance on AWS EC2:

  1. Navigate to the AWS Management Console: Open your web browser and go to the AWS Management Console. Log in with your AWS account credentials.

  2. Access EC2 Console: In the navigation pane, select "EC2" to access the Elastic Compute Cloud console.

  3. View Instances: Click on "Instances" to see a list of your instances.

  4. Identify Running Instance: Locate the specific running instance you want to connect to. Click on its instance ID to access its details.

  5. Connect to the Instance: Within the instance details, find and click on the "Connect" button. This action will open the EC2 CLI interface, providing you with connection details and options.

Now that you have connected to your EC2 instance, you can proceed to install Docker on the machine.

Install Docker on the EC2 Machine

Follow these steps to install Docker on your EC2 machine:

# Install Docker using the apt repository
# Set up Docker's apt repository
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

# Install Docker packages
# For the latest version
```sudo apt-get install docker-ce docker-ce-cli docker-buildx-plugin docker-compose-plugin```

# Verify Docker installation
```sudo docker run hello-world```

# Add your user to the "docker" group to run Docker commands without sudo
```sudo usermod -aG docker $USER```

# Restart the Docker service to apply the changes
sudo systemctl restart docker

Deploy using github actions

github actions allows us to automate deployment using CI/CD pipeline

1. Add GitHub Actions Workflow File:

  • Inside your repository, create a new file named .github/main.yml.
  • Edit the main.yml file and add instructions for Dockerizing and shipping the image to ECR, and pulling it on an EC2 machine. You can use the sample workflow provided earlier.

4. Configure GitHub Actions Workflow:

  • Customize the workflow file by replacing placeholders with your specific details (e.g., AWS credentials, ECR details, Dockerfile location).
  • Commit and push the changes to trigger the workflow.

5. Go to Repository Settings:

  • In your GitHub repository, navigate to the "Settings" tab.

5. Add Secrets:

  • In the left sidebar, click on "Secrets."
  • Click on "New repository secret."
  • Create secrets for your AWS credentials, ECR details, and any other sensitive information used in your workflow.

6. Create a Self-Hosted Runner:

  • Go to the "Actions" tab in your repository.
  • Click on "New runner" to set up a self-hosted runner.
  • Follow the instructions to download and configure the runner on your EC2 machine.

7. Start Self-Hosted Runner:

  • On your EC2 machine, navigate to the folder where you extracted the self-hosted runner files.
  • Run ./ install to install the runner as a service.
  • Run ./ start to start the self-hosted runner.

8. Verify Workflow Execution:

  • Make a change to your code and push it to the repository.
  • Check the "Actions" tab to see the progress of the workflow.
  • The workflow should automatically trigger, dockerize your application, push the image to ECR, and pull it on your EC2 machine.


This repo contains an end-to-end solution for predicting product back-orders using machine learning. It includes data preprocessing, model training, and evaluation tools, designed to help businesses mitigate supply chain disruptions

License:MIT License


Language:Jupyter Notebook 99.6%Language:Python 0.4%Language:Dockerfile 0.0%