Lecture on regularization and optimization

Introduction: evaluation and improving your neural network

Trial and error when building a network (among others through hyperparameters):
Explain use of a training set, a hold-out CV set and a test set, plus common proportions
Explain bias (high = underfitting) and variance (low = overfitting)
When overfitting → use regularization

Regularization

Explain L2 regularization in the context of logistic regression
L2 regularization in a neural network
- L2 regularization changes the expression used in backpropagation ("weight decay")
- Intuition of what happens (depending on size lambda)
Dropout regularization: randomly removing nodes in your network
- How implement it? Set a dropout probability, and make sure that nodes are dropped at random
- Make sure to invert your expected value so that doesn’t change
- When testing don’t use drop-out for predictions
Max norm technique for regularization: enforce an absolute upper bound on the magnitude of the weight vector for every neuron
Other regularization techniques?
- Add more data (eg. alter images slightly) = data augmentation
- Early stopping: plot training error vs dev set error and stop iterating when your errors diverge

anishsingh20 / dl-regularization-optimization