ML_Project_DigitRecognition_USPS

Extraction Methods of Handwritten Digit Recognition Tested on the USPS Database

Handwritten digit recognition has recently been of very interest among the researchers because of the evolution of various Machine Learning, Deep Learning and Computer Vision algorithms , this paper deals with an handwritten digit recognition system and a method of extraction of characteristics based on the digit form, this method is tested on the USPS handwritten isolated digit database (20000 images in learning and 1500 images in test). In this report, I compare the results of some of the most widely used Machine Learning Algorithms like SVM, KNN, PCA, LDA, QDA, LRC, MLP, ABC, DTC, HOG + SVM, Bayesian, SGD & RFC this work has achieved approximately 80% of success rate for USPS database identification.

Our model works in three steps: 1) Preprocessing, 2) Features extraction and 3)Use classification Method. In the preprocessing, we have some basic image processing to separate numbers from real samples or preparing data from dataset (which is reshaped from images to the vectors) and then in the second part, we extract features which is very distinguishable descriptor for digits recognition where we divide an input image into fixed 28*28 cells and we represent each digit with a vector of features. The summary of several classification methods that used in this paper are: QDA that is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule, RFC is a random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting, DTC is a Decision Tree Classifier, repetitively divides the working area(plot) into sub part by identifying lines. repetitively because there may be two distant regions of same class divided by other, MLP is a Multilayer Perceptron classifier, this model optimizes the log-loss function using LBFGS or stochastic gradient descent and SVM a Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples.

Digit recognition on the USPS dataset involves training a machine learning model to accurately classify handwritten digits. The USPS dataset consists of images of handwritten digits ranging from 0 to 9.

Here's a step-by-step guide to performing digit recognition on the USPS dataset in Python:

Import the required libraries:

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
Load the USPS dataset using the fetch_openml function from scikit-learn:

Load the USPS dataset using the fetch_openml function from scikit-learn:

usps = fetch_openml('usps', version=2)
Preprocess the data:
Separate the features (pixel values) and targets (digit labels):

X = usps.data.astype('float64')
y = usps.target.astype('int64')
Split the data into training and testing sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Normalize the pixel values to a range between 0 and 1:

X_train /= 255.0
X_test /= 255.0
Train a machine learning model on the training data. In this example, we'll use a Multi-Layer Perceptron (MLP) classifier:
python

model = MLPClassifier(hidden_layer_sizes=(50,), max_iter=10)
model.fit(X_train, y_train)
Evaluate the model's performance on the test data:

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Visualize a few example predictions:

fig, axes = plt.subplots(3, 3, figsize=(8, 8))
for i, ax in enumerate(axes.flat):
    ax.imshow(X_test[i].reshape(16, 16), cmap='gray')
    ax.set_title(f"True: {y_test[i]}, Pred: {y_pred[i]}")
    ax.axis('off')
plt.show()

This is a basic example of how to perform digit recognition on the USPS dataset using Python. You can explore different classifiers and experiment with hyperparameter tuning to improve the accuracy of the model.

I hope this explanation helps! Let me know if you have any further questions.

Accuracy

The table of accuracy:

Dataset

You can download USPS dataset from here:

https://www.kaggle.com/datasets/bistaumanga/usps-dataset

put dataset into your path directory and then run and enjoy!

AISoltani / ML_Project_DigitRecognition_USPS

ML_Project_DigitRecognition_USPS

Extraction Methods of Handwritten Digit Recognition Tested on the USPS Database

Accuracy

Dataset

About

Languages