kutoga / GeometricNN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Geometric Neural Networks

IMPORTANT: This project is not yet finished!

Classical neural networks are based on classical calculus of Newton and Leibniz. One might ask if there are useful alternatives?

Why classical calculus?

We mainly focus on derivatives: Classical calculus derivatives are additive:

mathematical expression

This property is very helpful to differentiate linear functions like:
mathematical expression

One can use different well-known optimization methods which are based on these derivatives.

"Non-classical" calculus?

What even is this? Wikipedia might help you, but if you read the next few lines, it might be even easier for you, to follow my idea.

You probably already know the intergral operator: mathematical expression. You probably know that this is somehow a continuous version of mathematical expression. These two operators are closely related. There is also a multiplication-operator mathematical expression which computes a product over a discrete set of numbers. But, is there also a continous version of this operator? Actually, it is not that hard to derive a continuous version:

mathematical expression
mathematical expression

Now, we can replace the mathematical expression by the integral operator. This results in:

mathematical expression

Cool, we have a continuous product operator. There is a well-known syntax for this operator (which I actually do not like too much, but it is ok):

mathematical expression

We can even remove the limits:

mathematical expression

This "new" integral-like operator has some interesting properties, probably the most important are:

mathematical expression

mathematical expression

One might think about to invert this operator, which is actually quite easy. We denote the resulting expression as mathematical expression:

mathematical expression

This operator is also multiplicative and has another interesting property: It removes any constants from the input function:

mathematical expression

mathematical expression

As it can be seen easily, this operator cannot handle functions with zeros: The operator is undefined for any position which results in a zero value. As it can be done for
the "normal" derivative, we can derive the chain-rule and many other simple derivative rules for this operator. One might find the following derivative rules:

mathematical expression

mathematical expression

A useful property is that the derivative is multiplicative: This allows a much more efficient computation of the multiplicative derivative of products (compared to the classical derivative), because each factor (=function) can be handled independent of each other.

The "classical" gradient descent is defined as:

mathematical expression

Which can be written as:

mathematical expression

Of course, given a neural network, mathematical expression is usually mathematical expression and describes weights.

Very similar, we can define a multiplicative gradient descent:

mathematical expression

One might ask, what this "optimization" method does? Or even more basic: What is the geometric meaning of this multiplicative derivative? For the classical derivative, we all know that it is the tangent of the function at a given point. It shows in which direction the function increases (a positive derivative means the function grows with a higher mathematical expression value and mathematical expression means the function decreases). The multiplicative derivative on the other side, shows in which direction the absolute value of the function increases: This means it show the direction away from mathematical expression. This is not given by the absolute value, but by a factor: If the multiplicative derivative is mathematical expression, then the function will have a larger absolute value with a larger mathematical expression and if it it mathematical expression, then the function will have a smaller absolute value with a larger mathematical expression.

The multiplicative gradient descent therefore divides by the multiplicative gradient. Unfortunately, the multiplicative gradient is independent of the absolute position of mathematical expression and for large mathematical expression this division can create very large steps. For this reason, the proposed corrected multiplicative derivative is:

mathematical expression

What do you try to show?

I want to show that the multiplicative derivative can be useful to optimize neural networks. Especially multiplicative neural networks. I want to show this with a minimal library that implements simple neural networks and which can be seen as a proof-of-concept that this (maybe?) new method works. Or that it does not work.

About


Languages

Language:Jupyter Notebook 95.4%Language:Python 3.8%Language:TeX 0.8%Language:Makefile 0.0%