rzabolotin / ml_zoomcamp_2022_project_1

Predict the amount of money that a customer willing to spend on a car

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project: At what price will you buy a car?

Project created at cohort 2022 of ML Zoomcamp course.

The solved problem is a regression problem. We try to predict the amount of money that a customer willing to spend on a car, knowing some features of a customer.
Knowledge of this will help car dealers to make better decisions on whom and how to sell their cars.

During the EDA and model creation I realized that the dataset is synthetic and the target variable is a linear combination of the features. And the model can be described as a linear combination of features.

But I still decided to create a model, and make all steps of the project

Sources of data

In this project, I used the data from the ANN - Car Sales Price Prediction dataset on Kaggle.

Dataset description:
As a vehicle salesperson, you would like to create a model that can estimate the overall amount that consumers would spend given the following characteristics:

  • customer name
  • customer email
  • country
  • gender
  • age
  • annual salary
  • credit card debt
  • and net worth

While EDA I decided to enrich the dataset with the information about country. For this I used the Countries of the world from kaggle.
I added the following features:

  • Region
  • Population
  • Area (sq. mi.)
  • Pop. Density (per sq. mi.)
  • Coastline (coast/area ratio)
  • GDP ($ per capita)
  • Birthrate
  • Deathrate

Project structure:

  • notebooks - Folder with notebooks
    • EDA - Exploratory data analysis and data preparation
    • Model selection - Model creation and selection
  • scripts - Folder with scripts
    • data preparation - Script for data preparation
    • model training - Script for model training
    • model evaluation - Script for model evaluation
    • vectorizers - Module with vectorizer's classes (I move them to a separate module to be able tu pickle them) (also I copied this file to the notebooks folder to be able to use it in the notebooks, and to bento folder for serving)
  • data - Folder with data
    • raw - Folder with raw data
    • processed - Folder with processed data (filled by notebooks/scripts) (created during training)
  • artifacts - Folder with artifacts of the project (model & vectorizer) (created during training)
  • bento - Folder with bentoML service
  • docker - Folder with docker files
  • terraform - Folder with terraform files for deployment
  • README.md - Project description
  • pipenv - Pipenv file with project dependencies

How to run the project:

  1. Clone the repository
  2. Install the dependencies
pipenv install
  1. Prepare the data for training
pipenv run python scripts/1_data_preparation.py
  1. Train catboost model
pipenv run python scripts/2_model_training.py
  1. Run sample prediction
pipenv run python scripts/3_model_evaluation.py
  1. Run local bentoml service
cd bento
pipenv run bentoml serve --production

Then you can test API on http://localhost:3000
or by curl:

curl -X 'POST' \
  'http://localhost:3000/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"customer_name": "Martina Avila",
               "customer_email": "cubilia.Curae.Phasellus@quisaccumsanconvallis.edu",
               "country": "Bulgaria",
               "gender": 0,
               "age": 42,
               "annual_salary": 62812,
               "credit_card_debt": 11609.5,
               "net_worth": 238961.2}'

Containerization

The project is containerized with Bentoml.
To build the container, run the following command:

cd bento
pipenv run bentoml build
pipenv run bentoml containerize what_price:latest

To run the container, run the following command:

docker run -it --rm -p 3000:3000 what_price:uqfzsys5isn3caav

Then you can test API on http://localhost:3000

The Dockerfile created by bentoml is located in the docker folder.

Deployment

For deployment, I used AWS Lambda.
I used following tutorial from bentoml documentation.

  1. Install bentoctl
pip install bentoctl
  1. Create AWS Lambda deployment
bentoctl operator install aws-lambda
  1. Initialize deployment with bentoctl
bentoctl init
  1. Build and push AWS Lambda comptable docker image to registry
bentoctl build -b what_price:latest -f deployment_config.yaml
  1. Deploy to AWS Lambda
terraform init
terraform apply -var-file=bentoctl.tfvars -auto-approve
  1. You can try API on cloud in browser
    Or by curl:
curl -X 'POST' \
  'https://dvayarixs2.execute-api.us-west-1.amazonaws.com/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"customer_name": "Martina Avila",
               "customer_email": "cubilia.Curae.Phasellus@quisaccumsanconvallis.edu",
               "country": "Bulgaria",
               "gender": 0,
               "age": 42,
               "annual_salary": 62812,
               "credit_card_debt": 11609.5,
               "net_worth": 238961.2}'
  1. Destroy deployment
bentoctl destroy -f deployment_config.yaml

Used libraries & tools

About

Predict the amount of money that a customer willing to spend on a car

License:Apache License 2.0


Languages

Language:Jupyter Notebook 96.4%Language:Python 2.3%Language:HCL 0.7%Language:Shell 0.3%Language:Dockerfile 0.2%