cvargas-xbrein / Car-Evaluation-Proyect

Through the methodology we create a robust ML model. That helps when evaluating a vehicle.

Home Page:https://car-evaluation-app.herokuapp.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Car Evaluation

Definition of Problem

Features of Dataset

  • Buying Degree of purchase of the vehicle.

  • Maint Degree of vehicle maintenance.

  • Doors Number of vehicle doors.

  • Persons Number of people that fit in the car.

  • Lug Boot Car trunk size.

  • Safety Degree of vehicle safety.

Almost all variables are well balanced.

Except for the class tag which is the variable to predict. Therefore, our model can be biased and give the prediction preference to the class that is in greater proportion.

Proyect Summary

Approach

The class variable has a clear imbalance. Therefore we will use the SMOTE transformation. Which consists in creating new data similar values. According to the percentage of class that is in greater proportion.

Balanced Class

balanced

Model Interpration

Decesion Tree

Model Parameters

  • max_depth: Maximum depth of the tree.
  • random_state: Random tree state

It works very similar to human logic. Use mathematical inequalities instead of questions. Since it is easier for algorithms to handle numbers than words.

Selection of Max Depth Ideal

max_depth

Random Forest

Share several in common parameters. Except for a few:

Model Parameters

  • n_estimators: Number of decision trees.
  • n_jobs: Number of working cores of the CPU.

The Random Forest algorithm works in a similar way to the decision tree. While decision tree you can only use one tree. With Random Forest you can use a maximum amount of 1000 estimators.

Number of Estimators Ideal

n_estimators

Support Vector Machine

Model Parameters

  • kernel: SVC core.
  • degree: Degree of the polynomial.

svc

It consists of finding the best hyperplane that fits the data set. According to the kernel provided by the user. It works quite well for relatively small data sets.

It has the disadvantage of requiring a scalar setting for variables. In order to make them comparable to each other. Since these algorithms are very sensitive.

Selection of the Best Model

Criteria

  • The model must have balanced accuracy. For each case.
  • The model must not only be adapted to the set of training deals.
  • The algorithm must have good performance. For data you've never seen.
  • Must have a high percentage of generalization.

best_model

Conclusion

Although the Random Forest outperforms the other models. The Decesion Tree has a fairly similar performance. Also be less computationally heavy and easier to explain.

Therefore I consider it as the winning model. Which I will use to develop the application.

Opening the Black Box

Decesion Tree Plot

decision_tree

It is just a short form of the decision tree. Since it has more depth.

Plot Importance

plot_importance

Finally we send the application to production. To a free server since the algorithm is not very computationally expensive. Therefore, it can make predictions with speed on a server that uses CPU. We use a free Herouku server, so that other users can access and use the application.

About

Through the methodology we create a robust ML model. That helps when evaluating a vehicle.

https://car-evaluation-app.herokuapp.com/


Languages

Language:Jupyter Notebook 98.8%Language:Python 1.1%Language:Shell 0.0%Language:Procfile 0.0%