This project explores the classification of handwritten digits using Singular Value Decomposition (SVD). A labeled training set builds the classifier, and a separate test set evaluates its accuracy—mirroring applications like automatic zip code recognition.
Handwritten digit images are grayscale matrices where each entry is a pixel’s brightness. Here, each image is a 28×28 matrix (flattened into a 784-dimensional vector). For each digit d = 0, …, 9, training images are stacked as columns of a matrix—enabling efficient linear-algebra operations like SVD.
A grayscale image can be represented by a
Each image is a flattened vector of size
For each digit
Training columns for one digit lie in a low-dimensional subspace.
The columns of
We have
thus the
which means that the coordinates of image
Solving the least-squares problem
yields the optimal vector with entries
and zero residual.
Columns of
Use a rank
For a test digit
and predict the label with the smallest residual.
- For each digit
$d\in{0,\dots,9}$ , stack its training images as columns of$A^{(d)}$ . - Compute the SVD of
$A^{(d)}$ and keep the first$k$ left singular vectors$U_k^{(d)}$ . - For a test image
$\delta$ , compute the residual to each class subspace and predict the digit with the smallest residual.
Compute the SVD for digits 3 and 8 (each
Stack 400 training images per digit to form each matrix. Apply SVD, plot singular values (linear scales), and visualize the first three singular images
Assess how well each digit fits a low-dimensional subspace, compare singular value decay, and interpret singular images as prototypes/variations within each class.
Implement an SVD-based digit classifier using 400 training images per digit (ten
-
Train bases: For each digit
$d$ , build$A_d$ and keep the top-$k$ left singular vectors$U_k$ for$k=5,\ldots,15$ . -
Classify: For each test image, project onto each
$U_k$ and choose the digit with the smallest projection residual (Euclidean norm). -
Evaluate: Test all 40,000 images against
TestLabels.npyand report per-digit accuracy for each$k$ .
Quantify how performance depends on
TrainDigits.npy,TrainLabels.npy: Training data and labels (10 classes)TestDigits.npy,TestLabels.npy: Test data and ground truth (note: large files)svd_classifier.py: Main Python script implementing the classifierREADME.md: This file

