Project created at cohort 2022 of ML Zoomcamp course.
The solved problem is a regression problem. We try to predict the amount of money that a customer willing to spend on a car, knowing some features of a customer.
Knowledge of this will help car dealers to make better decisions on whom and how to sell their cars.
During the EDA and model creation I realized that the dataset is synthetic and the target variable is a linear combination of the features. And the model can be described as a linear combination of features.
But I still decided to create a model, and make all steps of the project
In this project, I used the data from the ANN - Car Sales Price Prediction dataset on Kaggle.
Dataset description:
As a vehicle salesperson, you would like to create a model that can estimate the overall amount that consumers would spend given the following characteristics:
- customer name
- customer email
- country
- gender
- age
- annual salary
- credit card debt
- and net worth
While EDA I decided to enrich the dataset with the information about country. For this I used the Countries of the world from kaggle.
I added the following features:
- Region
- Population
- Area (sq. mi.)
- Pop. Density (per sq. mi.)
- Coastline (coast/area ratio)
- GDP ($ per capita)
- Birthrate
- Deathrate
- notebooks - Folder with notebooks
- EDA - Exploratory data analysis and data preparation
- Model selection - Model creation and selection
- scripts - Folder with scripts
- data preparation - Script for data preparation
- model training - Script for model training
- model evaluation - Script for model evaluation
- vectorizers - Module with vectorizer's classes (I move them to a separate module to be able tu pickle them) (also I copied this file to the notebooks folder to be able to use it in the notebooks, and to bento folder for serving)
- data - Folder with data
- artifacts - Folder with artifacts of the project (model & vectorizer) (created during training)
- bento - Folder with bentoML service
- docker - Folder with docker files
- terraform - Folder with terraform files for deployment
- README.md - Project description
- pipenv - Pipenv file with project dependencies
- Clone the repository
- Install the dependencies
pipenv install
- Prepare the data for training
pipenv run python scripts/1_data_preparation.py
- Train catboost model
pipenv run python scripts/2_model_training.py
- Run sample prediction
pipenv run python scripts/3_model_evaluation.py
- Run local bentoml service
cd bento
pipenv run bentoml serve --production
Then you can test API on http://localhost:3000
or by curl:
curl -X 'POST' \
'http://localhost:3000/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"customer_name": "Martina Avila",
"customer_email": "cubilia.Curae.Phasellus@quisaccumsanconvallis.edu",
"country": "Bulgaria",
"gender": 0,
"age": 42,
"annual_salary": 62812,
"credit_card_debt": 11609.5,
"net_worth": 238961.2}'
The project is containerized with Bentoml.
To build the container, run the following command:
cd bento
pipenv run bentoml build
pipenv run bentoml containerize what_price:latest
To run the container, run the following command:
docker run -it --rm -p 3000:3000 what_price:uqfzsys5isn3caav
Then you can test API on http://localhost:3000
The Dockerfile created by bentoml is located in the docker folder.
For deployment, I used AWS Lambda.
I used following tutorial from bentoml documentation.
- Install bentoctl
pip install bentoctl
- Create AWS Lambda deployment
bentoctl operator install aws-lambda
- Initialize deployment with bentoctl
bentoctl init
- Build and push AWS Lambda comptable docker image to registry
bentoctl build -b what_price:latest -f deployment_config.yaml
- Deploy to AWS Lambda
terraform init
terraform apply -var-file=bentoctl.tfvars -auto-approve
- You can try API on cloud in browser
Or by curl:
curl -X 'POST' \
'https://dvayarixs2.execute-api.us-west-1.amazonaws.com/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"customer_name": "Martina Avila",
"customer_email": "cubilia.Curae.Phasellus@quisaccumsanconvallis.edu",
"country": "Bulgaria",
"gender": 0,
"age": 42,
"annual_salary": 62812,
"credit_card_debt": 11609.5,
"net_worth": 238961.2}'
- Destroy deployment
bentoctl destroy -f deployment_config.yaml