tsurrdurr / MNIST-classification

My take on the infamous excercise in machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MNIST-classification

An attempt in exploring scikit-learn functionality and machine learning algorithms using samples provided by MNIST database.

Installation

I recomend to use virtualenv and set up a new environment for this project.

Python version used is Python 3.6.

Linux installation:
You may simply run the following code to install all the dependencies:
pip install -r requirements.txt

Windows installation:
With Windows, installation from requirements.txt doesn't quite work out for me, but this link helped me out a lot with Windows wheels. Download the following packages for your system and python 3.6:
numpy+mkl
scikit-learn
scipy
and install them with:
pip install .\wheel_name.whl.

Running

Download 4 MNIST database files from here and put them all to a folder where temporary files may be stored. Do not extract archives content.
Set root variable in read_data.py to folder where you have put the MNIST samples.
You would want to run read_data.py, then classification.py with desired parameter and finally prediction.py.

read_data.py

Reads the training data in a format, described at the bottom of official MNIST page.

The resulting arrays containing images and their labels are saved with scikit-learn tool joblib (so you only have to do it once).

classification.py

Creates a classifier — object capable of determining a class of input data object.

List of classification.py parameters:
-svc - use Linear SVC classifier
-sgd - use SGD classifier
-nb - use Multinomial Naive Bayes classifier
-kn - use KNeighbors classifier

Arrays from earlier are read from disk and reshaped with numpy tools. Then a classifier of selected type is created based on input images and their classes (0-9). After this accuracy of the classifier is displayed (percentage of correctly classified values from the training set).
Classifier is also saved to disk.

prediction.py

Tests the generated classifier with test data.

Classifier from earlier read from disk and test values are parsed according to MNIST format description. predict function of the classifier returns array of predicted values of test images. Then these predictions are compared to test labels.

For now ~91.4% of test values are recognized with Linear SVC.
KNeighbors prediction test result shows 96.5% accuracy, but it takes really much time and disk space to create a classifier.

About

My take on the infamous excercise in machine learning


Languages

Language:Python 100.0%