saaranshM / used-car-price-prediction

Different regression models to predict the price of a used card using scikit learn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Used car price prediction using diffrent regression models.

In this small project I used the used car dataset from Kaggle to predict selling prices of used cars.

Things Learnt

  • Dealing with incomplete data
  • Data Cleaning
  • Data visualization
  • Stratified sampling
  • Hyperparameter Tuning
  • Different regression models

Insights gained

From my intial handling of the dataset I found out that there were many missing values that had to either be removed or filled in. The dataset then had to be split using stratified sampling due to the wide range of cars and selling prices. The correlation between the selling price and the max power of the car was very strong and from this I gave 2 more attributes to the dataset as it may help with better predictions those were bhp/cc and bhp/seat.

Then the dataset was passed through 3 Regrssion models which were:

  • Linear Regressor
  • Decision Tree Regressor
  • Random Forest Regressor

The Mean and STD for each are as follows:

  • Linear Regressor : Mean 200428.7627, Standard Deviation: 51512.9411
  • Decision Tree Regressor: Mean 212996.7645, Standard Deviation: 68163.2524
  • Random Forest Regressor: Mean 172750.1854, Standard Deviation: 53068.7194

From this we can see that the Random Forest Regressor faired a little bit better from the other two models. We can see that these predictions are okay but not that great. This can most probably improved by better handling of the missing data or by gaining more data as this dataset conisted of roughly 8000 non-null rows.

Conclusion

This model is nowhere close for deployment in the real world but a bit more tuning the dataset and getting more data will go a long way in making this model more accurate.

About

Different regression models to predict the price of a used card using scikit learn


Languages

Language:Jupyter Notebook 100.0%