Neural Benford

<<<<<<< HEAD Benford's law is a fascinating property that applies to many naturally occuring numbers that the leading digit distribution follows a non-uniform skewed distribution. It has been shown to apply to a wide variety of datasets including electricity bills, stock prices, lengths of rivers, Fibonacci numbers and the factorials, among others.

This repository contains a Jupyter notebook investigating whether the leading digits of weights in a neural network follow Benford's Law.

It appears that the weights of a network do not follow Benford's Law before training but do approximately follow Benford's Law after convergence and then deviate from Benford's Law when the model starts overfitting.

A surprising result is that it appears that model Validation Accuracy is maximised around the time when the leading digit MAD vs. Benford's Law is minimized! This has observed with MNIST & Fashion MNIST with several different architectures but needs to be explored on more architectures and datasets.

Benford's Law

From the Wikipedia page, Benford's Law "states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time."

Here's a great Numberphile video talking about Benford's Law.

Experiment Details

I compared the leading weight digit distribution before and after training convergence of a convolutional neural network architecture adapted from the Keras documentation. I compared the distributions of weights in just the first layer and all layers in the network for MNIST and Fashion MNIST.

The leading digit was calculated by ignoring the weight sign and taking the first non-zero digit in the weight value.

Directions for Future Work

Plot mean deviance over time as the network is trained
see if sampling starting weights from Benford's law improves convergence
Check if results hold for different architectures/datasets
Perform goodness-of-fit distribution tests
Compare different weight initializations
Return weight statistics before/after training
Investigate using deviation as a measure of network fit

Inspired by this Reddit thread.

alxcnwy / neuralbenford

Neural Benford

Benford's Law

Experiment Details

Directions for Future Work

About

Languages