anmolaggarwal98/Deep_learning-SGN

gradient-noise deep-learning learning-rate sgn mnist alpha stoch stochastic gaussian-distribution levy sgd alpha-stable-processes heavy-tailed

Non - Gaussian Behaviour of Stochastic Gradient Noise in Deep Learning

Candidate Number: XXXXXXX

In recent years, there has been a growing interest in Stochastic Gradient Descent (SGD) and its modifications (just as AdaDelta and Adam) in the field of machine learning, mainly due to its computational efficiency. It is often assumed that gradient noise follows Gaussian distribution in large data-sets by invoking the classical Central Limit Theorem. However, the results in my report (will be published here shortly) shows that this is far from true, in fact we show that stochastic gradient noise (SGN) follows an alpha-stable distribution, which is a family of heavy tailed distribution where alpha is a tail index. For validation, we build two models from scratch by just the use of numpy for vector operations. We only use keras to import MNIST and Fashion-MNIST datasets. The models try to show results on two questions:

Does the choice of activation function have a big effect on distribution of SGN?: for this we run the tests using relu and sigmoid where the implementation can be found in model_epoch_vs_alpha.py. I have run a test on the file test_epoch_alpha_relu.py where the graphs can be seen in the folder mnist_activation. Please adjust this file accordingly to change the activation function and datasets. All the documentation are provided in the doc-strings in the models. I have also attached the jupyter notebook epoch_alpha.ipynb which although outdates and not well documented, shows you my progress and also plots.
Does the choice of learning rate effect the distribution of SGN?: for this we run the tests using relu and sigmoid and we adjust the learning rate from 0.001 to 0.1 with an increment which is user defined. The implementation can be found in model_lr_vs_alpha.py. I have run a test on the file test_lr_alpha.py where the graphs can be seen in the folder mnist_lr. My report is heavily based on the research paper: http://proceedings.mlr.press/v97/simsekli19a/simsekli19a.pdf which you meant find useful to understand what and why I have done this project

Installation:

Please feel free to clone this repository and play around with the code. I have tried to keep the documentation in the doc strings above the function as understandable as possible.

Hope you like it. Enjoy!!!!!!

About

Deep_Learning: Stochastic Gradient Noise heavy tail distribution Analysis

gradient-noise deep-learning learning-rate sgn mnist alpha stoch stochastic gaussian-distribution levy sgd alpha-stable-processes heavy-tailed

Languages

Language:Python 100.0%