Project: At what price will you buy a car?

Project created at cohort 2022 of ML Zoomcamp course.

The solved problem is a regression problem. We try to predict the amount of money that a customer willing to spend on a car, knowing some features of a customer.
Knowledge of this will help car dealers to make better decisions on whom and how to sell their cars.

During the EDA and model creation I realized that the dataset is synthetic and the target variable is a linear combination of the features. And the model can be described as a linear combination of features.

But I still decided to create a model, and make all steps of the project

Sources of data

In this project, I used the data from the ANN - Car Sales Price Prediction dataset on Kaggle.

Dataset description:
As a vehicle salesperson, you would like to create a model that can estimate the overall amount that consumers would spend given the following characteristics:

customer name
customer email
country
gender
age
annual salary
credit card debt
and net worth

While EDA I decided to enrich the dataset with the information about country. For this I used the Countries of the world from kaggle.
I added the following features:

Region
Population
Area (sq. mi.)
Pop. Density (per sq. mi.)
Coastline (coast/area ratio)
GDP ($ per capita)
Birthrate
Deathrate

Project structure:

notebooks - Folder with notebooks
- EDA - Exploratory data analysis and data preparation
- Model selection - Model creation and selection
scripts - Folder with scripts
- data preparation - Script for data preparation
- model training - Script for model training
- model evaluation - Script for model evaluation
- vectorizers - Module with vectorizer's classes (I move them to a separate module to be able tu pickle them) (also I copied this file to the notebooks folder to be able to use it in the notebooks, and to bento folder for serving)
data - Folder with data
- raw - Folder with raw data
- processed - Folder with processed data (filled by notebooks/scripts) (created during training)
artifacts - Folder with artifacts of the project (model & vectorizer) (created during training)
bento - Folder with bentoML service
docker - Folder with docker files
terraform - Folder with terraform files for deployment
README.md - Project description
pipenv - Pipenv file with project dependencies

How to run the project:

Clone the repository
Install the dependencies

pipenv install

Prepare the data for training

pipenv run python scripts/1_data_preparation.py

Train catboost model

pipenv run python scripts/2_model_training.py

Run sample prediction

pipenv run python scripts/3_model_evaluation.py

Run local bentoml service

cd bento
pipenv run bentoml serve --production

Then you can test API on http://localhost:3000
or by curl:

curl -X 'POST' \
  'http://localhost:3000/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"customer_name": "Martina Avila",
               "customer_email": "cubilia.Curae.Phasellus@quisaccumsanconvallis.edu",
               "country": "Bulgaria",
               "gender": 0,
               "age": 42,
               "annual_salary": 62812,
               "credit_card_debt": 11609.5,
               "net_worth": 238961.2}'

Containerization

The project is containerized with Bentoml.
To build the container, run the following command:

cd bento
pipenv run bentoml build
pipenv run bentoml containerize what_price:latest

To run the container, run the following command:

docker run -it --rm -p 3000:3000 what_price:uqfzsys5isn3caav

Then you can test API on http://localhost:3000

The Dockerfile created by bentoml is located in the docker folder.

Deployment

For deployment, I used AWS Lambda.
I used following tutorial from bentoml documentation.

Install bentoctl

pip install bentoctl

Create AWS Lambda deployment

bentoctl operator install aws-lambda

Initialize deployment with bentoctl

bentoctl init

Build and push AWS Lambda comptable docker image to registry

bentoctl build -b what_price:latest -f deployment_config.yaml

Deploy to AWS Lambda

terraform init
terraform apply -var-file=bentoctl.tfvars -auto-approve

You can try API on cloud in browser
Or by curl:

curl -X 'POST' \
  'https://dvayarixs2.execute-api.us-west-1.amazonaws.com/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"customer_name": "Martina Avila",
               "customer_email": "cubilia.Curae.Phasellus@quisaccumsanconvallis.edu",
               "country": "Bulgaria",
               "gender": 0,
               "age": 42,
               "annual_salary": 62812,
               "credit_card_debt": 11609.5,
               "net_worth": 238961.2}'

Destroy deployment

bentoctl destroy -f deployment_config.yaml

rzabolotin / ml_zoomcamp_2022_project_1

Project: At what price will you buy a car?

Sources of data

Project structure:

How to run the project:

Containerization

Deployment

Used libraries & tools

About

Languages