Car Evaluation

Definition of Problem

Features of Dataset

Buying Degree of purchase of the vehicle.
Maint Degree of vehicle maintenance.
Doors Number of vehicle doors.
Persons Number of people that fit in the car.
Lug Boot Car trunk size.
Safety Degree of vehicle safety.

Almost all variables are well balanced.

Except for the class tag which is the variable to predict. Therefore, our model can be biased and give the prediction preference to the class that is in greater proportion.

Proyect Summary

Approach

The class variable has a clear imbalance. Therefore we will use the SMOTE transformation. Which consists in creating new data similar values. According to the percentage of class that is in greater proportion.

Balanced Class

Model Interpration

Decesion Tree

Model Parameters

max_depth: Maximum depth of the tree.
random_state: Random tree state

It works very similar to human logic. Use mathematical inequalities instead of questions. Since it is easier for algorithms to handle numbers than words.

Selection of Max Depth Ideal

Random Forest

Share several in common parameters. Except for a few:

Model Parameters

n_estimators: Number of decision trees.
n_jobs: Number of working cores of the CPU.

The Random Forest algorithm works in a similar way to the decision tree. While decision tree you can only use one tree. With Random Forest you can use a maximum amount of 1000 estimators.

Number of Estimators Ideal

Support Vector Machine

Model Parameters

kernel: SVC core.
degree: Degree of the polynomial.

It consists of finding the best hyperplane that fits the data set. According to the kernel provided by the user. It works quite well for relatively small data sets.

It has the disadvantage of requiring a scalar setting for variables. In order to make them comparable to each other. Since these algorithms are very sensitive.

Selection of the Best Model

Criteria

The model must have balanced accuracy. For each case.
The model must not only be adapted to the set of training deals.
The algorithm must have good performance. For data you've never seen.
Must have a high percentage of generalization.

Conclusion

Although the Random Forest outperforms the other models. The Decesion Tree has a fairly similar performance. Also be less computationally heavy and easier to explain.

Therefore I consider it as the winning model. Which I will use to develop the application.

Opening the Black Box

Decesion Tree Plot

It is just a short form of the decision tree. Since it has more depth.

Plot Importance

Finally we send the application to production. To a free server since the algorithm is not very computationally expensive. Therefore, it can make predictions with speed on a server that uses CPU. We use a free Herouku server, so that other users can access and use the application.

cvargas-xbrein / Car-Evaluation-Proyect

Car Evaluation

Definition of Problem

Features of Dataset

Proyect Summary

Approach

Balanced Class

Model Interpration

Decesion Tree

Model Parameters

Selection of Max Depth Ideal

Random Forest

Model Parameters

Number of Estimators Ideal

Support Vector Machine

Model Parameters

Selection of the Best Model

Criteria

Conclusion

Opening the Black Box

Decesion Tree Plot

Plot Importance

About

Languages