This is a project to understand the neural network implementaion.
The aim is to create a library similar to pytorch, with basic rudimentary functionalities.
The library is named chochin
and as of yet, following functionalities are present:
chochin.nn.Linear
- Linear Layer
chochin.activations.<function>
- ReLU
- Sigmoid
chochin.loss.<function>
- MSELoss()
- BCELoss()
chochin.optim.<function>
- SGD
The demo.ipynb file shows the usage of the library.
chochin: `import chochin` |---nn | |---Linear | |---activations | |---ReLU | |---Sigmoid | |---loss | |---MSELoss | |---BCELoss | |---optim |---SGD
- In pytorch everything is in
torch.tensor
here everything is innumpy.ndarray
if not error is raised. - Basic import:
import chochin
- Custom class:
class MyClass(chochin.nn.NeuralNetwork
- Layers:
self.hidden1 = chochin.nn.Linear(in_features=<some num>, out_features=<some num>, bias=<True or False>, requires_grad=<default True>
- Activation functions:
self.activation_fun = chochin.activations.ReLu()
- Loss functions:
loss_fun = chochin.loss.MSELoss()
which returns an object - Loss Calculation:
loss = loss_fun(yhat,y)
which returns a scalar value. unlike pytorch. - Calculating gradients:
model.backward(loss_fun.backward())
which doesnt return anything, not like pytorch - Optimizer:
optimizer = chochin.optim.SGD(model,lr=<some num>)
, where model is thechochin.nn.NeuralNetwork
or its subclass object, unlike model.parameters() in pytorch. - Updating parameters:
optimizer.step()
- optim.zero_grads(): NA, everytime the gradients are overwritten.
There is a base class chochin.nn.NeuralNetwork
, and all further neural models should necessarily be subclass of it.
It is equivalent to torch.nn.Module
As of yet only linear layer is present: chochin.nn.Linear
which is equivalent to torch.nn.Linear
.
It takes parameters:
- in_features : <int>input features
- out_features : <int> output features
- bias : <boolean> if bias needs to be present or not
- requires_grad : <boolean> whether gradient needs to be calculated or not
There is a base class ActivationFunction
, and all activation functions are necessarily its subclass.
They take parameter requires_grad = default True
because of the way gradients are calculated here naively. It needs to be necessarily True for backpropagation.
All activation functions necessarily needs to initialize:
-
self.function: a->h
: calculate the output of provided aggregation value. -
self.derivative_function: h wrt a
: used during backpropagationThe available activation functions are:
- ReLU:
chochin.activations.ReLU(requires_grad=True)
- Sigmoid:
chochin.activations.Sigmoid(requires_grad=True
- ReLU:
Base class: chochin.loss.LossFunction
and all loss functions necessarily need to be its subclass.
They take parameter requires_grad = default True
which is required for backpropagation.
All subclass activation functions need to define following necessarily:
self.function: yhat,y -> loss
self.derivative_function: yhat,y -> d(L)/d(yhat)
: Used during backpropagation.
Available Loss functions:
chochin.loss.MSELoss()
chochin.loss.BCELoss()
: Its a basic implementation and thus results are not so great.
There is no base class for this yet.
Available optimizers:
chochin.optim.SGD()
: which takesmodel
(instantiation ofchochin.nn.NeuralNetwork
or its subclass) andlr
(learning rate) as parameters.
- Each layer has
derivative_function: inputs->derivative wrt inputs
or calculates the equivalent using functions. - Each layer stores the necessary inputs which are required to calculate the derivatives wrt some input.
- Because of this even activation functions and loss needs to store the inputs given to them.
- Using the
derivative_function
(or equivalent) and the saved inputs, the partial derivative at the current node is calculated. - Further the partial derivatives is passed to back layers and just like in theory using the chain rule, product of partial derivatives are taken.