Machine Learning with Neural Networks

Artificial Neurons

The basic unit of work in a neural network is the Artificial Neuron. An Artificial Neuron has an associated potential to emit a signal. For convenience the value of the potential is kept between and . If the potential the neuron is active, if the neuron is inactive. We can implement the Artificial Neuron as a function with an array of activation values i.e. in it's internal scope. The function parameters are an array of weight values or . The output then is the signal . The values and are defined as tensors because the types of operations or functions that will be used to manipulate the Artificial Neurons comes from a branch of mathematics called Tensor Analysis. Consider the implementation of based on the following:

We define tensors and .
Multiply tensors and i.e. .
The tensor product will be .
Reduce to a scalar value by adding it's components.
The sum of the components is .
is called a weighted sum and is represented by where is the number of elements in .
determines the strength of the signal emitted by the Artificial Neuron.
Capping adds additional control over signal emission and is done by subtracting a bias from the sum.
It is possible for to have a value outside the desire signal strength . For this reason an activation function is used to bring into the desired range.
One of the commonly used activation functions is the sigmoid .

In conclusion the implementation of an Artificial Neuron is the function .

Neural Networks

A neural network is a computational graph of Artificial Neurons. Neural networks are composed of neural network layers. A neural network layer is a tensor of Artificial Neurons. The Artificial Neurons in a neural network layer are connected to each other because they are components of a tensor. We can define layer n as . Neural networks have three layer types input, hidden, and output. A neural network may have multiple hidden layers but only one input and output layers. Consider a neural network consisting of the fallowing layers:

Neural network themselves are tensors. In this case neural network . Artificial Neurons in a neural network are associated to each other via function composition. Consider it has an internal tensor of activation values . The number of components in is one. The output of is a potential . The key question one must ask at this point is, how are the number of activation values in associated to the number of activation values in ? Here is where the magic happens becomes the input weight for and . This means that becomes a weight value tensor and the input for and . This means that the activation values tensor for is a tensor with one component because the input layer consist of only one component . Is important to notice that the number of activation values in a layer's Artificial Neurons are determined by the number of Artificial Neurons in the previous layer.

For completeness let's consider the output layer Artificial Neuron . Based on our current understanding has an internal tensor of activation values because has two components and . The output for is a potential and the output for is a potential therefore the weight value tensor is . The weighted sum for is and its potential is . Notice how all Artificial Neurons in every layer of the neural network are relaying information i.e. emitting a signal directly or indirectly to each other in a forward direction. The type of neural network where all Artificial Neurons are connected to each other is called a dense neural network.

Neural Networks In Action

Introduction

We define a neural network algorithm as a function that produces an output in response to an input and n number of hidden layers i.e. . A neural network is a system defined by the following tensor . In our daily experience we go through time and we have a state at each moment in time. Our reality is a series of moments in time. At each moment we can assess our state and map any number of metrics to an exact moment in time and persist the resulting information representing our state. Our memories are our state and we derive knowledge from them. Compare to you or me is a very simple system, a moment of time for is represented by evaluating and at a given value of . We bring to life by feeding it input and evaluating the output of every Artificial Neuron in each neural network layer .

Back Propagation Training Using The Gradient Descent Algorithm

Back propagation is the most widely used machine learning algorithm. The algorithm's objective is to find the optimal values for that will yield expected outputs in through a training process. The algorithm's steps are:

Initialize Artificial Neurons in by assigning random to every and in the range .
Iterate over the training dataset.
For each item in the dataset forward propagate by invoking the activation function on Artificial Neuron from (all hidden layers) to (the output layer) using the input value for each item in the data set as the input. The signals of Artificial Neurons in a previous layer became the input for for the current layer.
Backwards propagate the error by iterating over the layers in reverse order and calculating the error between the current output and the expected output the output for the corresponding input in the dataset. One of the most commonly used error, cost, or loss functions to compare vs. is the Mean Squared Error function . The error indicates how close the signal is to .
Compute the rate of change in the cost function. The rate of change of a single variable function with one scalar output is called a derivative i.e. . The rate of change of a multi variable function with one scalar output is called the gradient i.e . The gradient indicates the direction and magnitude of greatest increase for the error function. In this case the needs to be computed since we are dealing with multi variable tensors.
needs to be negative because the objective is to advance towards lower error or cost i.e. . Define learning rate a number , used as a factor that determines the magnitude of in conjunction with . Define momentum a number between , used as a factor that determines the magnitude of in conjunction with . The magnitude of will determine how big of a step we take in our search to minimize the error or cost . The magnitude of will determine how much of an influence the previous values of have in our search to minimize the error or cost . Compute the scalar values by which needs change in order to decrease error i.e. bring closer . Follow the same procedure to fine tune the bias .
After iterating over the complete training data set verify that the current error is less or equal to the error threshold or that the maximum number of iterations was reached, if true stop training else continue. Each complete iteration over all items in a training set is called an epoch.

Is crucial to understand that and associated biases change as a result of back propagation while changes as a result of forward propagation. This means that properly labeled data is essential for training and how well performs. When practicing machine learning you will be presented with the opportunity to adjust so called hyper parameters some of them are:

error threshold.
expected number of epochs.
learning rate.
momentum.

Training Data

The process of preparing training data sets is challenging. The key to the process is proper vectorization and labeling of training data. Neural networks can be applied to all kind of problems involving regression, classification, or prediction. The way data is prepared for training requires careful consideration of the domain and the goals one intents to achieve.

Imagine we have a set of data representing the horse power , and the miles per gallon of a model . The array represents an element in our raw dataset. Our objective is to determine if there is a relationship between and and to design a neural network that will help us predict the given . To prepare the data for consumption we need understand what are the inputs and outputs for our model. Since our intent is to predict in relationship to regardless of the model, then our training data becomes . The last step in the process is data normalization, and is usually accomplished by min-max feature scaling. The function for min-max feature scaling is where is any value, is the maximum, and is the minimum in the array . Normalization assures that the value is always within the range . In our case study and the expected output . Normalization is necessary because it brings any data set to the necessary range .

Conclusion

Neural networks are computational graphs used to universally model functions. The majority of relationships represented by functions are not linear, for this reason logistic functions like , or are used to modulate signals, they introduce nonlinearity to the Artificial Neuron model which increases the scope of problems we can solve. Normalization helps by keeping everything at the same scale and allows the system to be more sensitive when recognizing patterns. When a neural network is trained it becomes a function in tensor form specific to the training domain. After training, the acquired knowledge can be preserved by serializing , associated biases, and all the hyper parameters used during training. The resulting kernel of knowledge is very tiny in comparison to the training data and could be used almost anywhere including a web browser. When utilizing neural networks for regression, prediction, or classification the activation values come form the input provided and the wights are not changed. The activation values flow forward from the input layer to the output layer.

Examples

In the examples directory you can find several examples using the Brain.JS framework to create and train neural networks.

geosp / hello_brain