manu-chauhan / minigrad

A from scratch Python based small autograd repo implementing backpropagation with ability to implement Neural Networks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

minigrad

A small autograd repo implementing backpropagation with ability to develop small Neural Networks.

Greatly inspired by Andrej Karpathy's micrograd


TODO:

  1. utils and visualizations

  2. Implement all operations:

Relu, Log, Exp                          # unary ops
Sum, Max                                # reduce ops (with axis argument)
Add, Sub, Mul, Pow.                     # binary ops (with broadcasting)
Reshape, Transpose, Slice               # movement ops
Matmul, Conv2D                          # processing ops

Reverse Mode Auto DIfferentiation

Let's say we have expression 𝑧=π‘₯1π‘₯2+sin(π‘₯1) and want to find derivatives 𝑑𝑧𝑑π‘₯1 and 𝑑𝑧𝑑π‘₯2. Reverse-mode AD splits this task into 2 parts, namely, forward and reverse passes.

Forward pass:

First step is to decompose the complex expression into a set of primitive ones, i.e. expressions consisting of at most single step or single function call.

𝑀1 = π‘₯1

𝑀2 = π‘₯2

𝑀3 = 𝑀1 * 𝑀2

𝑀4 = sin(𝑀1)

𝑀5 = 𝑀3 + 𝑀4

𝑧 = 𝑀5

The advantage of this representation is that differentiation rules for each separate expression are already known.

For example, we know that derivative of sin is cos, and so dw4/dw1=cos⁑(w1).

We will use this fact in reverse pass below. Essentially, forward pass consists of evaluating each of these expressions and saving the results.

Say, our inputs are: π‘₯1=2 and π‘₯2=3. Then we have:

𝑀1 = π‘₯1 = 2

𝑀2 = π‘₯2 = 3

𝑀3 = 𝑀1 * 𝑀2 = 6

𝑀4 = sin(𝑀1) = 0.9

𝑀5 = 𝑀3 + 𝑀4 = 6.9

𝑧 = 𝑀5 = 6.9

Reverse pass:

This is the main part and it uses the chain rule.

In its basic form, chain rule states that if you have variable 𝑑(𝑒(𝑣)) which depends on 𝑒 which, in its turn, depends on 𝑣, then:

𝑑𝑑/𝑑𝑣 = 𝑑𝑑/𝑑𝑒 * 𝑑𝑒/𝑑𝑣

or, if 𝑑 depends on 𝑣 via several paths / variables 𝑒𝑖, e.g.:

𝑒1 = 𝑓(𝑣)

𝑒2 = 𝑔(𝑣)

𝑑 = β„Ž(𝑒1,𝑒2)

then:

𝑑𝑑/𝑑𝑣 = βˆ‘ 𝑑𝑑/𝑑𝑒𝑖 *𝑑𝑒𝑖/𝑑𝑣

In terms of expression graph, if we have a final node 𝑧 and input nodes 𝑀𝑖, and path from 𝑧 to 𝑀𝑖 goes through intermediate nodes 𝑀𝑝 (i.e. 𝑧=𝑔(𝑀𝑝) where 𝑀𝑝=𝑓(𝑀𝑖)), we can find derivative 𝑑𝑧/𝑑𝑀𝑖 as

𝑑𝑧/𝑑𝑀𝑖 = βˆ‘{π‘βˆˆPπ‘Žπ‘Ÿπ‘’π‘›π‘‘π‘ (𝑖)} 𝑑𝑧/𝑑𝑀𝑝 * 𝑑𝑀𝑝/𝑑𝑀𝑖

In other words, to calculate the derivative of output variable 𝑧 w.r.t. any intermediate or input variable 𝑀𝑖, we only need to know the derivatives of its parents and the formula to calculate derivative of primitive expression 𝑀𝑝=𝑓(𝑀𝑖).

Reverse pass starts at the end (i.e. 𝑑𝑧/𝑑𝑧) and propagates backward to all dependencies.

𝑑𝑧 / 𝑑𝑧 = 1

Then we know that 𝑧=𝑀5 and so:

𝑑𝑧 / 𝑑𝑀5 = 1

𝑀5 linearly depends on 𝑀3 and 𝑀4, so 𝑑𝑀5/𝑑𝑀3=1 and 𝑑𝑀5/𝑑𝑀4=1. Using the chain rule we find:

𝑑𝑧/𝑑𝑀3 = 𝑑𝑧/𝑑𝑀5 Γ— 𝑑𝑀5/𝑑𝑀3 = 1Γ—1 = 1

𝑑𝑧/𝑑𝑀4 = 𝑑𝑧/𝑑𝑀5 Γ— 𝑑𝑀5/𝑑𝑀4 = 1Γ—1 = 1

From definition 𝑀3=𝑀1𝑀2 and rules of partial derivatives, we find that 𝑑𝑀3 / 𝑑𝑀2=𝑀1. Thus:

𝑑𝑧/𝑑𝑀2 = 𝑑𝑧/𝑑𝑀3 Γ— 𝑑𝑀3/𝑑𝑀2 = 1 Γ— 𝑀1 = 𝑀1

Which, as we already know from forward pass, is:

𝑑𝑧/𝑑𝑀2 = 𝑀1 = 2

Finally, 𝑀1 contributes to 𝑧 via 𝑀3 and 𝑀4. Once again, from the rules of partial derivatives we know that 𝑑𝑀3/𝑑𝑀1 = 𝑀2 and 𝑑𝑀4/𝑑𝑀1 = cos(𝑀1). Thus:

𝑑𝑧/𝑑𝑀1 = 𝑑𝑧/𝑑𝑀3 * 𝑑𝑀3/𝑑𝑀1 + 𝑑𝑧/𝑑𝑀4 * 𝑑𝑀4/𝑑𝑀1 = 𝑀2 + cos(𝑀1)

And again, given known inputs, we can calculate it:

𝑑𝑧/𝑑𝑀1 = 𝑀2 + cos(𝑀1) = 3 + cos(2) = 2.58

Since 𝑀1 and 𝑀2 are just aliases for π‘₯1 and π‘₯2, we get our answer:

𝑑𝑧 / 𝑑π‘₯1 = 2.58


𝑑𝑧 / 𝑑π‘₯2 = 2

And all is done for the given expression!

About

A from scratch Python based small autograd repo implementing backpropagation with ability to implement Neural Networks.

License:MIT License


Languages

Language:Python 100.0%