parkernisbet / mnist-digit-classifier

Gaussian naive Bayes classifier for digits in the MNIST dataset. Similar in nature to my other repo ("newsgroup-naive-bayes"), albeit instead of multinomial document classification, this repo explores gaussian image classification. Covariance smoothing utilized to minimize error rates to the ~4% realm.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MNIST Digit Classifier

Similar in nature to my other Github repo, "newsgroups-naive-bayes", this repo seeks to scratch-implement the naive Bayes theorem (i.e. without the use of pre-built sklearn modules). Where the two differ is in what is being classified: text documents for the "newgroups" repo (multinomial NB), and digit images for this repo (gaussian NB). Images from the MNIST handwritten digits database were first loaded in as flattened 1D arrays (28x28, so a 784-value long list), and then split and reorganized into train / test / validation data and labels. Log priors were calculated prior to the classifier function for each of the 10 digits (0 - 9), and probability density functions calculated on the fly (while running the classifier). Covariance smoothing was utilized to decrease error rates, getting as low as 4.35% at a c value of .04. Admittedly, a more granular c value likely would have yielded a smaller error rate, though ~4% was good enough for the purposed of this example project. I did peek a couple related projects submitted by other users on Kaggle (based on the original MNIST dataset and other expanded derivatives) and noted that while a near perfect error rate is achievable (approaching <1%), this was mostly using neural networks or other more complicated algorithms.

About

Gaussian naive Bayes classifier for digits in the MNIST dataset. Similar in nature to my other repo ("newsgroup-naive-bayes"), albeit instead of multinomial document classification, this repo explores gaussian image classification. Covariance smoothing utilized to minimize error rates to the ~4% realm.


Languages

Language:Jupyter Notebook 100.0%