Here we have 1 input layer, 1 hidden layer and 1 output layer. We are provided with inital weights for each layer.
In each layer starting from hidden layer, we have sigmoid function added to output for non linearity
Finally we have 2 output for 2 input which are E1 & E2. We have multiplied E1 & E2 with 1/2 to make calculation easy on calculating derivatives
We calculate output in each layer by multiplying with given weights and apply sigmoid wherever mentioned in architecture
E_total is sum of E1 & E2. Here t1 & t2 are two output or true labels
Here we have 2 layers - hidden and output layer so we will back propagate output E_total w.r.t to weights in two layer
For backpropagating w.r.t last layer we will calculate derivate of E_total w.r.t w5,w6,w7,w8
For backpropagating w.r.t hidden layer, we will calcualte derivate of E_total w.r.t w1,w2,w3,w4
As we increase the learning rate from 0.1 to 2 , we see that loss reduces drastically and tends to 0 in less iterations
Achieve 99.4% or more accuracy on mnist dataset with below constraints -
We are using seven Convolution layer, two Max pooling layer, two Transition layer followed by Avg pooling layer.
Here we have used 18,738 parameters in total and which is aligned with constraint of using less than 20k parameters
We managed to achieve 99.43% accuracy on validation dataset in epoch 18 althought after that it dropped by few decimals but this was acceptable criteria for the assignment