The following code provides the user with a few scripts to generate the datasets we used along with the code to run the tests of each classifier. We provide the datasets that we used for classification, but the user is free to overwrite them. We warn that the accuracies of each classifier may not be the same if the datasets get overwritten with different digits.
We again note that the datasets we trained and tested with are already provided in this repository, thus dataset creation will not be a necessary step to run this code.
In order to create each of the datasets, we have designed it so that
only the genDatasets.m
script needs to be run. By default we have
commented out the calls to generate the mnistData.mat
and
fontGenData.mat
files as they can take a long time. We instead provide
those two datasets with this project repository. If one would like to
create each dataset on their own, we detail the steps below.
-
Run the
genDataFromMnist.m
file, it should have produced themnistData.mat
file. Load this file for the next step. -
Then call
genTrainTestSets
while passing it themnistInputMat
andmnistTargetMat
frommnistData.mat
. We also pass the percent amount of digits to use for training and for testing (we used 90% for training and 10% for testing). This function will output four matrices, the first two are the input and target matrices for training, while the last two are the input and target matrices for testing. -
Name the input matrices as
trainHW
andtestHW
. Name the target matrices astrainHWTargets
andtestHWTargets
. From here we can save all four matrices into the fileHWData.mat
.
-
Run the
genTextDigitDataset.m
function by passing it a desired font name. We used the Courier font, and the function should produce thefontGenData.mat
file. Load this file for the next step. -
Then call
genTrainTestSets
while passing it thefontGenInputMat
andfontGenTargetMat
fromfontGenData.mat
. We also pass the percent amount of digits to use for training and for testing (we used 90% for training and 10% for testing). This function will output four matrices, the first two are the input and target matrices for training, while the last two are the input and target matrices for testing. -
Name the input matrices as
trainFG
andtestFG
. Name the target matrices astrainFGTargets
andtestFGTargets
. From here we can save all four matrices into the fileFGData.mat
.
-
Create the
mnistData.mat
andfontGenData.mat
files as in the previous steps. -
Call
genTrainTestSets
while passing it themnistInputMat
andmnistTargetMat
from themnistData.mat
file. We also pass the percent amount of digits to use for training and for testing (we used 27% for training and 5% for testing). This function will output four matrices, the first two are the input and target matrices for training, while the last two are the input and target matrices for testing. Name the input matrices astrainHW
andtestHW
. Name the target matrices astrainHWTargets
andtestHWTargets
. -
Call
genTrainTestSets
while passing it thefontGenInputMat
andfontGenTargetMat
fromfontGenData.mat
. We also pass the percent amount of digits to use for training and for testing (we used 63% for training and 5% for testing). This function will output four matrices, the first two are the input and target matrices for training, while the last two are the input and target matrices for testing. Name the input matrices astrainFG
andtestFG
. Name the target matrices astrainFGTargets
andtestFGTargets
. -
After creating all the matrices, we should have eight of them. We then join (column-wise) the
trainHW
andtrainFG
into one matrix calledtrainHWFG
. We do the same for thetestHW
andtestFG
intotestHWFG
; similarly we createtrainHWFGTargets
andtestHWFGTargets
. This results in four matrices, the training and test sets with their targets. -
Save the resulting four matrices into
HWFGData.mat
.
If you would like to run both of the classifiers without any setup, run the main.m
file after uncommenting the two run calls inside of it. If you would like to run the classifiers individually, you can follow the steps below.
In order to use the average digit classifier, run the
runTestsForAvrgCl.m
file after having generated all of the datasets.
The result of running the script should produce nine confusion matrices
which detail the precision, recall, and accuracy of the AVGC for each
possible combination of training and test set.
In order to create, train, and test the neural networks, simply run the \texttt{runTestForNNCl.m} file after having generated all of the datasets. This script will create each of the three neural networks, then perform tests on each. The results from each test will appear as a confusion matrix, there should be a total of nine confusion matrices which detail the precision, recall, and accuracy of the NNC for each possible combination of training and test set.
We note that all of these steps can also be found in our resulting project paper.