twitter-archive / torch-autograd

Autograd automatically differentiates native Torch code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different results due to optimize(true)

mys007 opened this issue · comments

I have encountered weird behavior when turning on optimization.

The following code computes a simple MLP with cross-entropy loss. The loss can be computed in two ways (CrossEntropyCriterion or LogSoftMax&ClassNLLCriterion). The optimization is turned on. Unexpectedly, this produces different results. However, when I turn off optimization, the printouts are the same. Also, when I move the definition of df into the loop with the optimization turned on (i.e. grad is recomputed each time), the results are then the same as well.

PS: A piggy-backed issue is that loss.crossEntropy doesn't support batch mode due to util.logSumExp not supporting it.

Thank you for your help.

t = require 'torch'
grad = require 'autograd'

t.manualSeed(11)
grad.optimize(true) --COMMENT ME OUT

local params = {
   W = {
      t.randn(50,50),
      t.randn(50,10),
   }
}

local ces = grad.nn.CrossEntropyCriterion()
local cnl = grad.nn.ClassNLLCriterion()
local lsm = grad.nn.LogSoftMax()

local f = function(params, x, y)
   local h1 = t.tanh(x * params.W[1])
   local h2 = t.tanh(h1 * params.W[2])
   return ces(h2,y)
end

local g = function(params, x, y)
   local h1 = t.tanh(x * params.W[1])
   local h2 = t.tanh(h1 * params.W[2])
   return cnl(lsm(h2),y)
end

local df = grad(f)  --OR MOVE ME INTO THE LOOP

local dg = grad(g)


local inputs = torch.Tensor(100,50):normal(0,1)
local targets = torch.Tensor(100):fill(1)

for i=1,10 do
    local graddF, lossdF = df(params, inputs, targets)
    local graddG, lossdG = dg(params, inputs, targets)
    print(lossdF, graddF.W[1]:norm(), graddF.W[2]:norm())
    print(lossdG, graddG.W[1]:norm(), graddG.W[2]:norm())

    params.W[1]:add(-1e-3, graddF.W[1])
    params.W[2]:add(-1e-3, graddF.W[2])
end

That's pretty strange. I don't see any of the usual culprits.
cc @luketwitter

this bugged me again. CrossEntropyCriterion cannot be used with optimize = true, I remember myself trying to debug it without any luck a few months ago.

apparently codegen first calls backward and then forward starting from the second time it's called. First time is fine.