cars-engage-2022

Data Analysis

Developed an application to demonstrate how the Automotive Industry could harness data to take informed decisions.

Live demo here

Objective
Why this Project?
General Information about the Project
How to use the web application?
Technologies Used
Features
Room for Improvement
Installing/Contributing Guidelines
Contact

Objective

The ultimate goal of this project is to develop an application that the Automotive Industry could use to harness data to make informed decisions.
It helps to find out which is the most popular car on the basis of its engine type, fuel type, mileage, model, etc.
It will help them to figure out the trending designs of cars and can use those reports to accordingly manipulate the design of the cars,by making them more optimized, customer-centric, and innovative.
It also allows them to find out the relation between different parameters like how exactly the prices vary with the independent variables. With the help of customer segmentation, they can find which group of people prefers which features.

Why this Project?

Quick and accurate analysis of data is something that cannot be neglected in the automotive industry. The scale of data that is available to us is huge and it is tedious to find out reports from it manually, also there are chances of making errors while calculating. All these complexities make the traditional quantitative approaches to analysis inappropriate. This led to the need for the development of this application, with the help of Machine Learning concepts, we can easily find out the variations of one feature with respect to others accurately in a very less time.

General Information about the Project

This project uses the concept of Data Analysis to demonstrate how the Automotive Industry could harness data to make informed decisions.

I have used this Dataset->cars

Since the given dataset had missing values, I deleted the columns having more than 60% missing values.
Followed by filling the missing values with the mode of the column.
Then, I used plots such as : bar plots, box plots, and scatter plots to display the relationship between different attributes(features of the cars).
Followed by using an Elbow plot to find the optimal value of K to apply K-Means Clustering ,so as to group the people on the basis of the Displacement and the Ex-Showroom Price
Prior to using K-means Clustering, I scaled the values of Displacement and Ex-showroom Price, for making data points generalized so that the distance between them will be lower.

The below table shows the statistical summary of the dataframe given dataframe.
It includes count, mean, median (or 50th percentile) standard variation, minimum value,maximum value, and percentile values of columns.

The below Bar Plot represents the top 10 popular Make of Cars.
Maruti Suzuki is the most popular Make ,followed by Hyundai ,Mahindra,Tata and so on.
Skoda and Ford are having equal count,
Bmw and Renault are having nearly equal count,

The below Bar Plot represents the top 10 popular Models of Cars.
Nexon is a the top model and a tough competitor of Kuv100 Nxt.
Compass and Xuv500 are having equal count.
Seltos and Innova crysta are having equal count.
Ciaz and Swift are having equal count.

The below Bar Plot represents popular Types of Cars.
Manual Cars are prefered over Automatic ,AMT,DCT and CVT.

The below Bar Plot represents popular Fuel Types of Cars.
Cars running on Petrol and Diesel are more as compared to those running on CNG,Hybrid,Electric and CNG+Petrol.
CNG,Hybrid and Electric are having equal count.

The below Bar Plot represents the top 10 popular Body Type of Cars.
SUV is the most popular body type.
Coupe,MUV and MPV are having equal count.

The below Box plot represents that Renault is having the heighest Ex-showroom Price and median.
Tata is having the lowest Ex-showroom Price and median.
Renault is the most prefered Make by the middle 50% sample of the dataset.

The below Box plot represents that the Ex-showroom Price of Automatic cars are higher than Manual cars.
The middle 50% sample of the given dataset prefers Manual over Automatic cars.

The below Box plot represents that the Ex-showroom Price of Kwid Model is highest.
The middle 50% sample of the given dataset does not prefers Alto K10.
Nano Genx is having the least Ex-showroom Price.

The below Box plot represents that cars running on Petrol are most prefered.

The below Box plot represents that Hatchback Body type is prefered over the MPV Body Type by the middle 50% sample of the given dataset..

The below Bar plot shows that Renault is having the heighest Ex-Showroom Price.
Maruti Suzuki is a tough competitor of Renault.

City Mileage of Manual is heigher as compared to Automatic cars.

Eeco has the heighest Displacement.
Redi-Go ,Kwid Model and Alto K10 are having nearly equal Displacement.

This is an Elbow Plot and is used to find the optimal value of k ,for k-Means Clustering.
Since the elbow is formed at k=3,therefore the optimal value of k is 3 .i.e. we'll have 3 clusters.

The below scatter plot represents the 3 clusters formed by considering Displacement and Ex-Showroom Price for K-Means Clustering.

How to use the web application?

Click on this link :- Information & Visualization

The above link directs to the an interface where you can check the following information about the dataset :-

The number of rows and columns.
The datatype of each columns.
The number and percentage of missing values in each columns.
Head and tail of the dataframe.
Statistical description.
Box plots,bar plots and scatter plots between different columns ,so as to understand their relation.

Click on this link :- Customer Segmentation

The above link directs to the an interface where you can check the following information :-

The head of the dataframe after scaling Displacement and Ex-Showroom Price.
Elbow Plot(used to find the optimal value of k).
Scatter plot between Displacement and Ex-Showroom Price (after using K-means clustering).

Features

Below are the informations that can be found from my application :-

The most popular Car as per their Make , Model,Fuel Type ,Body Type, etc.
How exactly the prices vary with the independent variables.
Customer segments.

Room for Improvement

Since the aim of the project was to use data analysis,I have not used any algorithm to predict the price of a car.Although, algorithms like Linear Regression and Random Forest Algorithm can be used for predicting the price of the car.

Contact

😄 Created by - Shweta Bhagat

📧 Email : bhagatshweta0216@gmail.com

Feel free to contact me!

Shweta2024 / cars-engage-2022