Andy-P / RecurrentNN.jl

Deep RNN, LSTM, GRU, GF-RNN, and GF-LSTMs in Julia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question regarding cost

zahachtah opened this issue · comments

Hi, I am trying to learn RNN using recurrentNN.jl. I am not fluent yet in julia so I expect this is due to me not understanding the language yet. I get everything to work in the examples but I am a bit confused by one thing:

How is the cost actually passed to the update algorithm?

in the simple example on the readme.md in the git repository you you set the correct class as

outMat.dw[ix_target] -= 1;

and then call backdrop and step, but I can't figure out how the information actually gets into the network? neither backprop() or step actually gets the outMat variable passed?

also in the mode extended example, you call the cost function, and get the variable cost returned by the function. But then the variable cost never gets passed to the network through backdrop(g) or or step(...)? also in the costfunc I never see the cost being passed to anything.

If anyone would have a minute to explain how this happens (how the update algorithm knows about the function to minimise) I'd be most grateful.

@zahachtah ,

TLDR version:
The magic is happening in graph.jl. Have a look at how anonymous functions are used.


The longer answer:
If you look at the forwardprop() method in LSTM.jl you will see that it returns the output from the add() function in graph.jl. The type is a NNMatrix. This type contains two matrices. One for the weights (NNMatrix.w) and one to hold the derivatives (NNMatrix.dw) we calculate during the backward pass.

During the forward pass, we build up a list of anonymous functions while calculating the results of the forward pass. We then call backprop() which calls each of these cached functions in reverse order. It is these cached anonymous functions that are responsible for passing the derivatives back through the network.

For example, the last matrix operation of the forwardprop() method in LSTM.jl is the add() function. This line in the README example

 outMat.dw[ix_target] -= 1;

is actually setting the .dw field of the NNMatrix produced in the 1st line of the add() function

function add(g::Graph, ms::NNMatrix...)
    out = NNMatrix(ms[1].n, ms[1].d, zeros(ms[1].n, ms[1].d), zeros(ms[1].n, ms[1].d))
    @inbounds for m in ms
        @inbounds for j in 1:m.d, i in 1:m.n
            out.w[i,j] += m.w[i,j]
        end
    end
    if g.doBackprop
        push!(g.backprop,
            function ()
                @inbounds for m in ms
                    @inbounds for j in 1:m.d, i in 1:m.n
                        m.dw[i,j] += out.dw[i,j]
                    end
                end
            end )
    end
    return out
end

When we call backprop(g::Graph), it starts by calling the last anonymous function in its list which is the anonymous function created by add() during the forward pass (lines 10 thru 16 above). This function sets the .dw field of the NNMatrix for the 2nd to last function in the list, the mul() function. This process continues until we have called each of the cached functions we built up in the forward pass. And this is how the initial error is passed back through the network.

I hope this answers your question.

Andre

@zahachtah,

I'd like to close this issue if you have no follow up questions.

Andre

First off, thank you so much for your explanation. I am working through it to try and make a ijulia notebook tutorial which I could share. If I have another question, is there a mail i could use instead? Mine is jon.norberg@ecology.su.se. Thanks again!

One quick question is: if I want to do a LSTM that works on continuous inputs (one state and one possible action, input_vector_simpe=[S, A]) and I have data to train it how would I feed the loss function back? Would it be as simple as:

outMat.dw[:] -=( [S_RNNpredicted, A_RNNpredicted]-[S_desired, A_desired])^2;

or I guess rather:

outMat.dw[:] -= [S_desired, A_desired] - [S_RNNpredicted, A_RNNpredicted];

to get the gradient?