saktheeswaranswan / svm-multi-calss-machine-learnig-example

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MNIST Digits - Classification Using SVM

In this notebook, we'll explore the popular MNIST dataset and build an SVM model to classify handwritten digits. Here is a detailed description of the dataset.

This is a part of a kaggle competition - https://www.kaggle.com/c/digit-recognizer.

Objective

We will develop a model using Support Vector Machine which should correctly classify the handwritten digits from 0-9 based on the pixel values given as features. Thus, this is a 10-class classification problem.

Data Description

For this problem, we use the MNIST data which is a large database of handwritten digits. The 'pixel values' of each digit (image) comprise the features, and the actual number between 0-9 is the label.

Since each image is of 28 x 28 pixels, and each pixel forms a feature, there are 784 features. MNIST digit recognition is a well-studied problem in the ML community, and people have trained numerous models (Neural Networks, SVMs, boosted trees etc.) achieving error rates as low as 0.23% (i.e. accuracy = 99.77%, with a convolutional neural network).

Before the popularity of neural networks, though, models such as SVMs and boosted trees were the state-of-the-art in such problems.

We'll first explore the dataset a bit, prepare it (scale etc.) and then experiment with linear and non-linear SVMs with various hyperparameters.

NOTE:

Considering the computational limitations of the system and the data size at hand, to make our life easier we are going to use 50% of the available data set for model building.

Final model can be extended to operate on the complete data set as well, considering appropirate computatinal power is available by the computing machine.

About


Languages

Language:Jupyter Notebook 100.0%