Simple MLP - NeuralNetwork Library For Microcontrollers
Nothing "Import ant", just a simple library for implementing Neural-Networks(NNs) easily and effectively on any Arduino board and other microcontrollers.
NN Functions | Input Type (x) | Output Type (Y) | Action |
---|---|---|---|
BackProp(x) |
DFLOAT Array | - | Trains the Neural-network"Tells" to the NN if the output was correct/the-expected/X-inputs and then, "teaches" it. |
*FeedForward(x) |
DFLOAT Array | DFLOAT Array | Returns the output of it"Feeds" the NN with X-input values and returns Y-Output Values, If needed. |
getMeanSqrdError(x) |
Unsigned Int | DFLOAT | Returns the Mean Squared ErrorMSE, is SSE (Sum Squared Error) divided by the Product of number-οf-οutputs and inputs-per-epoch aka batch-size. |
Understanding the Basics of a Neural Network:
EXM
0
1
2
3
4
5
6
7
8
9
10
11
-
+
Use of activation-functions per layer-to-layer.
-
+
Optimizations based on user's preference.
-
+
Support for custom activation functions.
-
+
MSE/BCE/CCE loss-functions.
-
+
Support for double precision.
-
+
Many activation-functions.
-
+
Simplicity!
-
-
Better overall code.
-
-
Other training methods.
-
-
More Activation Functions.
-
-
Different weight initialization methods.
-
-
Even more properties, for many different needs.
✨ (See also): training with Tensorflow section)
🔤 Basic:
💾 Media:
🎲 Other:
- If you have an error with 'POINTER_REGS' Click Here
- Wherever you see the term
bias
means biases ifMULTIPLE_BIASES_PER_LAYER
is enabled - I am NOT a professional in any of those fields, even though I did this [...] I'm stupid in many cases too.
- If you don't want to
USE_64_BIT_DOUBLE
(which I also suggest you not to use), then make sure that you have used (4-byte)(32-bit)-precision variables when Training, Because Floats:"...are stored as 32 bits (4 bytes) of information...get more precision by using a double (e.g. up to 15 digits), on the Arduino, double is the same size as float."
Arduino Uno
- Everything seems to work fine
ESP32-C3
- Uses software-emulated EEPROM, so don't expect EEPROM-examples\functionalities to work on it
ATtiny85
- Doesn't have FPU that makes Maths on it, "difficult" for the SRAM (i think..?)
- If you want to use "Serial" on an ATtiny85 Click Here (Be Careful SoftwareSerial Uses A lot of SRAM)
- Backprop maths on an ATtiny85 won't work properly (due to SRAM limitations, unless NN too small), though Feed Forword maths will Work! [...] (since the first release I haven't tested it again on the ATtiny85 at least yet, so I am not 100% sure)
Note that DFLOAT
means float
, unless you USE_64_BIT_DOUBLE
, then it means double
. IS_CONST
means nothing, unless you USE_PROGMEM
, then it means const
.
(NN) Neural-Network's Constructors |
---|
Default Constructors |
Available if |
Available if defined |
Available if backpropagation is available ( |
Available if backpropagation is available ( |
Available if |
(: |
IS_CONST DFLOAT *default_Bias
IS_CONST DFLOAT *default_Weights
byte *_ActFunctionPerLayer = NULL
const unsigned int *layer_
const unsigned int &NumberOflayers
const DFLOAT &LRw
const DFLOAT &LRb
NN Functions | Input Type (x) | Output Type (Y) | Action |
---|---|---|---|
FeedForward_Individual(x) |
DFLOAT | DFLOAT Array | RAM Optimized FeedForward"Feeds" the NN with each one X-input Individually until it returns Y-Output Values, If needed. (Almost no RAM usage for input layer, see also: example) |
*FeedForward(x) |
DFLOAT Array | DFLOAT Array | Returns the output of the NN"Feeds" the NN with X-input values and returns Y-Output Values, If needed. |
BackProp(x) |
DFLOAT Array | - | Trains the NN"Tells" to the NN if the output was correct/the-expected/X-inputs and then, "teaches" it. |
load(x) |
String | bool | Loads NN from SDAvailable if#include <SD.h> |
save(x) |
String \ int | bool \ int | Saves NN to storage media |
print() |
- | String | Prints the specs of the NN(If _1_OPTIMIZE B10000000 prints from PROGMEM) |
No need for #define MEAN_SQUARED_ERROR
, MSE is the default loss and it is always enabled. The only case in which you will also need to define the MSE in your sketch, is only if you want to use it in relation with another loss-function. To use any other loss-function except from MSE just define it as seen below.
Loss Functions | Enabling MACRO |
---|---|
NN.getMeanSqrdError (unsigned int batch_size ) |
#define MEAN_SQUARED_ERROR |
NN.getBinaryCrossEntropy (unsigned int batch_size ) |
#define BINARY_CROSS_ENTROPY |
NN.getCategoricalCrossEntropy (unsigned int batch_size ) |
#define CATEGORICAL_CROSS_ENTROPY |
To use any of the variables below, you first need to #define
a loss function as said above too.
Loss variables | Sum variables |
---|---|
NN.MeanSqrdError | NN.sumSquaredError |
NN.BinaryCrossEntropy | NN.sumOfBinaryCrossEntropy |
NN.CategoricalCrossEntropy | NN.sumOfCategoricalCrossEntropy |
Because of (my uncertainty and) the strict RAM optimization that allows the library to use one array that stores only the values after the activation instead of two arrays storing values before and after the activation, the use of some derivative functions in backpropagation are not supported by this library at this moment, as also seen by the MACRO NO_BACKPROP
below. This means that if you want to use and #define
any function from 8-13 under the section "NO_BACKPROP
support" , you won't be able to use backpropagation.
Enabling MACRO | Activation Functions | Returns | |
---|---|---|---|
0 | #define Sigmoid |
NN.layers->Sigmoid(&x) |
1/(1+e^(-x)) |
1 | #define Tanh |
NN.layers->Tanh(&x) |
(e^(2*x)-1)/(e^(2*x)+1) |
2 | #define ReLU |
NN.layers->ReLU(&x) |
(x>0)?x:0 |
3 | #define LeakyELU |
NN.layers->LeakyELU(&x) |
(x>0)?x:AlphaLeaky*x |
4 | #define ELU |
NN.layers->ELU(&x) |
(x>0)?x:AlphaELU*(e^(x)-1) |
5 | #define SELU |
NN.layers->SELU(&x) |
(x>0)?x:AlphaSELU*(e^(x)-1) |
6 | #define Softmax |
NN.layers->Softmax(&x) |
void "complicated implementation" |
7 | #define Identity |
NN.layers->Identity(&x) |
x |
NO_BACKPROP SUPPORT |
|||
8 | #define BinaryStep |
NN.layers->BinaryStep(&x) |
(x < 0) ? 0 : 1 |
9 | #define Softplus |
NN.layers->Softplus(&x) |
log(1 + exp(x)) |
10 | #define SiLU |
NN.layers->SiLU(&x) |
x / (1 + exp(-x)) |
11 | #define GELU |
NN.layers->GELU(&x) |
(1/2) * x * (1 + erf(x / sqrt(x))) |
12 | #define Mish |
NN.layers->Mish(&x) |
x * Tanh(log(1 + exp(x))) |
13 | #define Gaussian |
NN.layers->Gaussian(&x) |
exp(-(x*x)) |
Derivative Functions | |||
0 | #define Sigmoid |
NN.layers->SigmoidDer(&fx) |
fx-fx*fx |
1 | #define Tanh |
NN.layers->TanhDer(&fx) |
1-fx*fx |
2 | #define ReLU |
NN.layers->ReLUDer(&fx) |
(fx>0)?1:0 |
3 | #define LeakyELU |
NN.layers->LeakyELUDer(&fx) |
(fx>0)?1:AlphaLeaky |
4 | #define ELU |
NN.layers->ELUDer(&fx) |
(fx>0)?1:fx+AlphaELU |
5 | #define SELU |
NN.layers->SELUDer(&fx) |
(fx>0)?LamdaSELU:fx+AlphaSELU*LamdaSELU |
6 | #define Softmax |
NN.layers->SoftmaxDer(&fx) |
fx * (1 - fx) |
7 | #define Identity |
NN.layers->IdentityDer(&x) |
x |
if you want to use other activation function from the default one, just define one other:
#define Sigmoid //[default] No need definition, for single activation across network
#define Tanh
#define ReLU
#define LeakyELU
#define ELU
#define SELU
...
Use any activation function per layer-to-layer, like :
#define ACTIVATION__PER_LAYER
#include <NeuralNetwork.h>
unsigned int layers[] = {3, 4, ..., 2, 1};
byte Actv_Functions[] = { 1, ..., 2, 0};
// Tanh > ... > ReLU > Sigmoid
If you want to drastically reduce ROM & slightly RAM size you can Define which Functions to use/compile, like:
#define ACTIVATION__PER_LAYER
#define Sigmoid // 0
//#define Tanh
//#define ReLU
//#define LeakyELU
#define ELU // 1
#define SELU // 2
...
#include <NeuralNetwork.h>
unsigned int layers[] = {3, 4, ..., 2, 1};
byte Actv_Functions[] = { 1, ..., 2, 0};
// ELU > ... > SELU > Sigmoid
(See also example) You can define up to 5. Every custom function, comes after every each non-custom one (numerically) eg:
#define ACTIVATION__PER_LAYER
#define Sigmoid // 0
//#define Tanh
//#define ReLU
//#define LeakyELU
#define ELU // 1
#define SELU // 2
#define CUSTOM_AF1 my_act_fun1 // 3
#define CUSTOM_AF2 my_act_fun2 // 4
...
Define derivative-functions, by just definening ..._DFX
:
#define CUSTOM_AF1 my_act_fun1
#define CUSTOM_DF1
And then use them in your sketch like:
// CUSTOM_DF1 is optional ...
#define ACTIVATION__PER_LAYER
#define Tanh
#define CUSTOM_AF1 my_sigmoid
#define CUSTOM_DF1
#include <NeuralNetwork.h>
// derivative function must end in "Der" | Limited to f(x), for optimization reasons
float NeuralNetwork::Layer::my_sigmoidDer(const float &fx){ return fx - fx * fx; }
float NeuralNetwork::Layer::my_sigmoid (const float &x ){ return 1 / (1 + exp(-x)); }
unsigned int layers[] = {3, 4, ..., 2, 1};
byte Actv_Functions[] = { 0, ..., 0, 1};
// Tanh > ... > Tanh > my_sigmoid
IMPORTANT NOTE: Be careful commenting in front of #define
, see issue #29
Enabling MACRO | Activation Variables | Default | Explenation |
---|---|---|---|
#define LeakyELU |
NN.AlphaLeaky | 0.01 | the α of Leaky |
#define ELU |
NN.AlphaELU | 1 | the α of ELU |
#define SELU |
NN.AlphaSELU | 1.6733 | the α of SELU |
#define SELU |
NN.LamdaSELU | 1.0507 | the λ of SELU |
Note that except from _numberOfInputs
and _numberOfOutputs
everything else is not valid when you USE_INTERNAL_EEPROM
Type | NN's Variables | Explenation |
---|---|---|
byte* |
NN.ActFunctionPerLayer |
if ACTIVATION__PER_LAYER defined |
DFLOAT |
NN.LearningRateOfWeights |
The Learning-Rate-Of-Weights |
DFLOAT |
NN.LearningRateOfBiases |
The Learning-Rate-Of-Biases |
DFLOAT* |
NN.weights |
If REDUCE_RAM_WEIGHTS_LVL2 |
Layer* |
NN.layers |
Layers of NN |
Layer's Variables | ||
DFLOAT* |
NN.layers[i].bias |
The bias of an individual layer[i], unless...NO_BIAS or MULTIPLE_BIASES_PER_LAYER is enabled. |
DFLOAT* |
NN.layers[i].outputs [] |
The Output array of an individual layer[i] |
DFLOAT** |
NN.layers[i].weights [][] |
if not REDUCE_RAM_WEIGHTS_LVL2 |
DFLOAT* |
NN.layers[i].preLgamma [] |
The γ-error of previous layer[i-1] |
unsigned int |
NN.layers[i]._numberOfInputs |
The Layer[i]'s Number Of inputs\nodes |
unsigned int |
NN.layers[i]._numberOfOutputs |
The number-Of-Outputs for an individual layer[i] |
#define _1_OPTIMIZE B00000000
_1_OPTIMIZE | Action | Keyword | |
---|---|---|---|
B00000000 |
Nothing | ||
B10000000 |
Use PROGMEM instead of RAMEnables the use of programmable-memmory instead of RAM, to store and use weights and biases |
USE_PROGMEM |
|
B01000000 |
Deletes previous layer's OutputsFor each layer-to-layer input-to-ouput operation of internal feedforward, it deletes the previous layer's outputs. Reduces RAM by a factor of ((the_sum_of_each_layer'_s _numberOfOutputs) - (_numberOfOutputs of_biggest_layer) *(4[float] or 8[double])Bytes ) approximately i think ? |
REDUCE_RAM_DELETE_OUTPUTS |
|
B00100000 |
❌ | Reduces RAM for Weights, level 1(Partially reduce) Not yet implimented |
REDUCE_RAM_WEIGHTS_LVL1 |
B00010000 |
📌 | Reduces RAM for Weights, level 2by a factor of (number_of_layers-1)*2 Bytes |
REDUCE_RAM_WEIGHTS_LVL2 |
B00001000 |
🟢 | Deletes previous layer's GammaAlways enabled (not switchable yet.) |
REDUCE_RAM_..._LAYER_GAMMA |
B00000100 |
ⓘ | Reduces RAM using static reference... to the NN-object (for layers) | by a factor of 2*(number_of_layers - 1 or 2)bytes. (With this optimization) Note that, when you are using multiple NN-objects interchangeably in your sketch, you should always updateNN.me before using the next one |
REDUCE_RAM_STATIC_REFERENCE |
B00000010 |
📌 | Disables MSE functionDisables the default loss function | Reduces ROM, RAM & CPU consumption, althought usually needed for backpropagation |
DISABLE_MSE |
B00000001 |
ⓘ | Use 8-Byte double instead of floatThis will work only if your MCU supports 8byte doubles eg. Arduino UNO DOESN'T |
USE_64_BIT_DOUBLE |
_2_OPTIMIZE | |||
B10000000 |
Use internal EEPROM instead of RAMWeights, biases, and activation functions stored-into and used-from the internal EEPROM of the MCU. Additionally, this meansREDUCE_RAM_WEIGHTS_LVLX has no effect. see also: example |
USE_INTERNAL_EEPROM |
|
B01000000 |
Use NN without biasesIt disables the use of biases in the entire NN |
NO_BIAS |
|
B00100000 |
Use more than 1 bias, layer-to-layerEnables the use of a unique bias for each unit\neuron of each layer-to-layer |
MULTIPLE_BIASES_PER_LAYER |
|
B00010000 |
Use F() macro for print functionSerial.print(...) strings, normally saved in RAM. This ensures strings are stored in PROGMEM (At least for Arduino boards) |
MULTIPLE_BIASES_PER_LAYER |
Please don't use keywords to define optimizations, use _X_OPTIMIZE
⚠️ = Backpropagation is not allowed- 🟢 = Always enabled (not switchable yet.)
- ❌ = Not yet implimented
- 📌 = Recommended
To train a neural-network, you can use Tensorflow to do so. Here's a basic python example:
# pip install tensorflow
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import LearningRateScheduler
import tensorflow as tf
import numpy as np
# Define if you want to use biases
IS_BIASED = True
# Enable 32-bit floating-point precision
tf.keras.backend.set_floatx('float32')
# Define the XOR gate inputs and outputs
inputs = np.array([
[ 0, 0, 0 ],
[ 0, 0, 1 ],
[ 0, 1, 0 ],
[ 0, 1, 1 ],
[ 1, 0, 0 ],
[ 1, 0, 1 ],
[ 1, 1, 0 ],
[ 1, 1, 1 ]
], dtype = np.float32)
outputs = np.array([[0], [1], [1], [0], [1], [0], [0], [1]], dtype = np.float32)
input_size = 3
# Create a simple convolutional neural network
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(input_size,)), # Input layer (no bias)
tf.keras.layers.Dense(3, activation='sigmoid', use_bias=IS_BIASED), # Dense 3 units
tf.keras.layers.Dense(1, activation='sigmoid', use_bias=IS_BIASED) # Output 1 unit
])
# Compile the model
optimizer = Adam(learning_rate=0.031)
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(inputs, outputs, epochs=900, verbose=0)
# Evaluate the model on the training data
loss, accuracy = model.evaluate(inputs, outputs)
print(f"Model accuracy: {accuracy * 100:.2f}%")
# Predict XOR gate outputs
predictions = model.predict(inputs)
print("Predictions:")
for i in range(len(inputs)):
print(f"Input: {inputs[i]}, Predicted Output: {predictions[i][0]:.7f}")
# Print biases and weights
# (IMPORTANT NOTE! they are printed as w[i][j] not w[j][i] | outputs * inputs)
print()
weights_biases = model.get_weights()
if IS_BIASED:
print("#define _2_OPTIMIZE B00100000 // MULTIPLE_BIASES_PER_LAYER \n")
print('float biases[] = {')
for l, (w, b) in enumerate(zip(weights_biases[::2], weights_biases[1::2])):
print(' ', end='')
for j in range(0, w.shape[1]):
print(b[j], end=', ')
print()
print('};\n')
else:
print("#define _2_OPTIMIZE B01000000 // NO_BIAS \n")
print('float weights[] = {', end="")
for l, (w, b) in enumerate(zip(weights_biases[::2], weights_biases[1::2])):
print()
for j in range(0, w.shape[1]):
print(' ', end='')
for i in range(0, w.shape[0]):
print(w[i][j], end=', ')
print()
print('};\n')
IMPORTANT NOTE: See how weights and biases are printed at the end of the script and make sure you have (on top of your sketch) enabled\defined _2_OPTIMIZE B00100000 // MULTIPLE_BIASES_PER_LAYER
or _2_OPTIMIZE B01000000 // NO_BIAS
depending on your needs of use. Additionally, if you want to use just 1 bias per layer-to-layer don't use any of those 2 optimizations (Althought, just so you know... Tensorflow doesn't seem to support 1 bias per layer-to-layer). Finally make sure to use float32
unless your MCU is compatible and you want to USE_64_BIT_DOUBLE
-optimization
(see also examples on how to train a NN directly on an MCU)
I want to really thanks Underpower Jet for his amazing tutorial, by bringing it more to the surface. Because after all the videos and links I came across, he was the one that made the most significant difference to my understanding of backpropagation in neural networks. Plus, I would like to thanks: giant_neural_network for this and this, 3Blue1Brown for this, the authors of ✨ this scientific article for referencing me, Ivo Ljubičić for using my library for his ✨ master thesis, Arduino community and everyone else who gave me the oportunity to learn and make this library possible to exist [...]
Here most of the resources I came across the internet, I recomend you to have a look if you want to (but please stay aware of the fact that for some of those sites, I had only opened them checked something and then closed them in a matter of seconds [so, please don't get them all seriously])
22\11\2023
Code Related:
-
Macros:
-
-
- Do not put comments in front of #define whatever
-
-
Arduino:
-
-
StackOverflow:
-
Neural Network Related
-
General:
12\08\2021
Neural Network Related
-
Videos:
-
Softmax:
-
StackOverflow\Exchange:
-
General:
Code Related:
-
Tools:
-
Macros:
-
Arduino:
-
-
StackOverflow:
-
-
-
General:
-
General:
-
Math:
-
Grammar:
-
Just "Random":
xx\xx\202x
-
Neural Network Related
-
kind of Intresting To me
-
General
-
Activation Function Related
-
Gradient Explosion and clipping Related
-
MNIST Related
-
Related to Programming
-
to C-type Languages
-
to Python
-
Other
-
Arduino Related
-
#MACROS / pre-processor directives
|
| Intresting |NN.
| Neural Network(s) |A.
| Arduino etc. |-
| Mostly .NET & Other |*
| Maybe Intresting?
Please consider donating something, even the least amount would be really appreciated
| Monero address: 87PVyQ8Vt768hnvVyR9Qw1NyGzDea4q9Zd6AuwHb8tQBU9VdRYjRoBL7Ya8yRPVQakW2pjt2UWEtzYoxiRd7xpuB4XSJVAW
-
Forgive me if I've made any mistakes and please don't take me seriously with every claim i make, I am mainly "self taught" in this field of NeuralNetworks, I am not a professional programmer nor do I have a good knowledge in many of the fields used to create this library, I just make things because I love to [...]
-
Also looking for jobs, if you are interested let me know, I really like working with embeded systems, C\C++, python, CLIs and etc.
if you want to help me&others to educate ourselves better and if you have a love and passion for sharing and helping, then I suggest you to join our discord server 🤍
My Instagram account is: giorgos.xou ;) feel free to ask me anything