dformoso / sklearn-classification

Data Science Notebook on a Classification Task, using sklearn and Tensorflow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Census Income Dataset Classification

Data Science Notebook on a Classification Task

Objective

In the Jupyter Notebook included in this page, we will using the Census Income Dataset to predict whether an individual's income exceeds $50K/yr based on census data.

The Dataset can be found here:

The Notebook can be found here:

Companion Mindmap/Cheatsheet

This Jupyter Notepad has a companion Mindmap/Cheatsheet that lists most of the Data Science steps that can be found at the following link:

Steps

In this Notebook, we'll perform:

  • Feature Exploration (Uni and Bi-variate)
  • Feature Imputation
  • Feature Selection
  • Feature Encoding
  • Feature Ranking
  • Machine Learning with sklearn and Tensorflow
  • Random Search
  • Accuracy, Precision, Recall, and f1 calculations
  • ROC Curve

Setup

This Notebook has been designed to be run on top of the Jupyter Tensorflow Docker instance found in the link below:

If you haven't downloaded Docker at this point, please visit:

Then, open a shell or terminal session and copy/paste the following:

docker run -itd \
  --restart always \
  --name jupyter \
  --hostname jupyter \
  -p 8888:8888 \
  -p 6006:6006 \
  jupyter/tensorflow-notebook:latest \
  start-notebook.sh --NotebookApp.token=''

Upon running the command, docker will automatically pull the images it needs and get the containers going for us.

Give it a minute or so for Jupyter to start, and head to the following URL: http://localhost:8888

You should now have Jupyter running. If after a minute you can't reach the URL, check that the containers are running correctly and the network has been created by typing:

### Check the containers are running
docker ps -a

Loading the Notebook

Download it from this link:

Go back to:

Troubleshooting Docker

Here's a few useful commands in case something goes wrong with your docker instance:

# Restart Jupyter Docker Container
docker restart jupyter

# Stop Jupyter Docker Container
docker stop jupyter

# Remove Jupyter Docker Container
docker rm jupyter

Feature Exploration (Uni and Bi-variate) Feature Imputation Feature Selection Feature Encoding Feature Ranking Machine Learning Training Random Search Accuracy, Precision, Recall, and f1 calculations ROC Curve

Screenshots

Feature Distribution Analysis

alt text

Feature Cleaning

alt text

Missing Values is Features

alt text

Bivariate Exploration

alt text alt text

Feature Correlation

alt text

Feature Importance

alt text

Feature PCA

alt text

Results from Machine Learning Algorithms

alt text

ROC for each Algorithm

alt text

About Me

Twitter:

Linkedin:

Email:

About

Data Science Notebook on a Classification Task, using sklearn and Tensorflow.

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 100.0%