- I am ashamed of how I tried to mimic of learning ML by having multiple "session" folders
- I see the errors of my way
Exploring the relm of machine Learning
1. Create and run tests
2. Gather data
2. Construct a mathematical model
3. Tweak around the model to make it work better with the data and move closer to the desired result
4. This tweaked model is a neural network
- Artificial neurons are the basic building blocks of any artificial neural network ~
elementary units
- An artificial neuron receives one or more inputs and sums them to produce an output or so-called activation
- Usually each input has its own separate weight
- And the sum is often added to a value known as bias ~ a value that is not dependent on the input and gives the final result of the sum the needed offset before forwarding it into the activation function
- Usually in the last step we put the sum through a non-linear function known as an activation function that activates our neuron
- Activation functions usually have a sigmoid shape but they may also take the form of other non-linear functions, picewise linear functions or step-functions
- You need the bias because the model without it would not be able to fit the data better
- Bias helps you control at which value the activation function will trigger
- If you don't have the bias the model can only modify the output based on the input parameters (weights) but if you introduce bias into the equation the model is capable of shifting the entire state regardless of the input
-
With the idea of driving the
cost
of our model to 0 using derivatives let's explore a way of approximating derivatives using a method namedfinite difference
-
A thought about the finite diffrence method:
- We should keep in mind that this method is not used in the realm of neural network engineering because it is slow and inaccurate
- In contrast, the
finite difference
method can be used as part of our learning process when trying to understand how neural networks work $$$$
-
Let's recap: As of right now we are trying to find the
minimum
of thecost function
by looking in which direction we want to move our parameterw
so that we reach theminimum
of ourcost function
-
From the definition of derivatives we know:
$$L=\lim_{h \to 0}\frac{f(x + h) - f(x)}{h}$$ A function of a real variable f(x) is differentiable at a point a of its domain if its domain contains an open interval I containinga
and the limitL
exists. This means that, for every positive real number $\epsilon$ (even very small), there exists a positive real number $\delta$ such that, for every h such that $|h|<\delta$ and $h \neq 0$ then $f(a+h)$ is defined, and$$|L-\frac{f(a+h)-f(a)}{h}| < \epsilon$$ Where| ... |
denotes the absolute value -
In other words we take the distance between the result of function
f
shifted by the parameterh
and the result of functionf
that is not shifted and we divide this distance by that same valueh
-
We do this as we drive the parameter
h
to 0. -
From the definition of
finite difference
we know: $$△_hf=f(x - h) - f(x)$$ -
Let's combine both ideas to compute the error distance of our cost (
dw
) and the error distance of our bias (db
)$$dw=\frac{(cost(w + eps, b) - cost(w, b))}{eps}$$ $$db=\frac{(cost(w, b + eps) - cost(w, b))}{eps}$$ -
Now let's adjust our parameters by subtracting
dw
from the parameterw
anddb
from the parameterb
$$w = w - dw$$ $$b = b - db$$ -
The first issue we encounter when computing error distance values is that values appear to be large numbers which results in our parameters
w
,b
"jumping around too much" and never reaching the desired values -
Let's introduce the
learning rate
concept to our model- We are now able to have more control over the learning speed of our model
- In our case it will solve the issue of error distance values being large
$$w = w - (lear_rate * dw)$$ $$w = w - (lear_rate * db)$$
- After an artificial neuron sums up all of its weights and adds its bias the output is forwarded through an activation function
- The goal of the activation function is to take the summed-up data, which can possibly be unbound or "all over the place", and so to say isolate the value (example between 0 and 1) and make it non-linear.
- One of the standard activation functions is named
Sigmoid
- It's any mathematical function that has the characteristic "S"-shaped curve or sigmoid curve.
- In our case we will look at a sigmoid function that maps values from
-Infinity
to+Infinity
to values from0
to1
- The closer you are to
-Infinity
the closer you are to0
- The closer you are to
+Infinity
the closer you are to1
- The closer you are to
- Let's take a look at a common example of a sigmoid function ~
Logistic function
$$\sigma(x)=\frac{1}{1+e^{-x}}=\frac{e^{x}}{1+e^{x}}=1-\sigma(-x)$$