deeplearning nalu neural-networks deepmind pytorch

NALU

Implementation of Neural Arithmetic Logic Units as discussed in https://arxiv.org/abs/1808.00508

This implementation

The implementation here deviates from the paper when it comes to computing the gate variable g
The paper enforces a dependence of g on the input x with the equation:
However for most purposes the gating function is only dependant upon the task and not the input
and can be learnt independantly of the input.
This implementation uses where G is a learnt scalar.

For recurrent tasks, however, it does make sense to condition the gate value on the input.

Limitations of a single cell NALU

Can handle either add/subtract or mult/div operations but not a combination of both.
For mult/div operations, it cannot handle negative targets as the mult/div gate output
is the result of an exponentiation operation which always yeilds positive results.
Power operations are only possible when the exponent is in the range of [0, 1].

Advantages of using NALU

The careful design of the mathematics ensure the learnt weights allow for both
interpolation and extrapolation.

Note

Power operations above the range of [0, 1] would need 2 NALU's stacked on top of each other.
The hidden dimentionality of the stacked NALU network will have to be greater than or equal to the exponent.

About

Neural Arithmetic Logic Units

deeplearning nalu neural-networks deepmind pytorch

Languages

Language:Jupyter Notebook 61.0%Language:Python 39.0%