Overview

In this era, emission is the most important thing that we must concern. With high emission there is a lot impact that we can feel, there are air pollution, climate change, etc. if emissions from human activities increase, they build up in the atmosphere and warm the climate, leading to many other changes around the world—in the atmosphere, on land, and in the oceans.as emissions from human activities increase, they build up in the atmosphere and warm the climate, leading to many other changes around the world—in the atmosphere, on land, and in the oceans Climate Change Indicator. and this is can be a big problem for us as a human.

One of factors that can produced a lot of emissions is transportation, therefore this notebook wants to predict emission from transportation to help reduce high emission vehicle.

Exploratory Data Analysis

Use data.info to see information of each columns and we know that there are 73585 rows and 12 columns
use data.isnull().sum() to check null or missing values in dataset
Because we only need several columns like Engine Size, Cylinders, Fuel Type, Fuel Consumption City, Fuel Consumption Highway (Hwy) and CO2 Emissions(g/km), then we remove the rest using data = data.drop([colums], axis=1)

To make user can see the fuel type meaning we change the alphabet representation using actual fuel type

from:

to:

Visualize total of each fuel type

Then we know that regular gasoline is the highest fuel type that most vehicles use and natural gas is the least fuel type that vehicle use

Then we plot correlation for each numerical data using scatter

Data Preprocessing

Because there is fuel type column that contain non numerical value, therefore we need to encode that into numerical value using pd.get_dummies()

Define x values and y value for x values contain all independent variables and y values contain label or dependent variable
split x and y into x_train, x_test, y_train, y_test using train_test_split and in this case I use split size 80% for train_size and 20% for test_size

Modeling

For modeling I use 4 model, there are:

Linear Regression with estimator LinearRegression(fit_intercept=False, n_jobs=30)
Ridge Regression with estimator Ridge(alpha=2.0, solver='svd')
Random Forest Regression with estimator RandomForestRegressor(max_depth=50, max_features=None, min_samples_split=8)
Neural Network with layers like this:

Result

For the result I got accuracy and MAE for each model like this:

Then we can see that best MAE and accuracy goes to Random Forest Regression. Not only that, I alos saved my models into pickle and js for tensorflow or deep learning model

isa96 / emission-prediction

Overview

Exploratory Data Analysis

Data Preprocessing

Modeling

Result

About

Languages