Different results due to optimize(true)
mys007 opened this issue · comments
I have encountered weird behavior when turning on optimization.
The following code computes a simple MLP with cross-entropy loss. The loss can be computed in two ways (CrossEntropyCriterion
or LogSoftMax
&ClassNLLCriterion
). The optimization is turned on. Unexpectedly, this produces different results. However, when I turn off optimization, the printouts are the same. Also, when I move the definition of df
into the loop with the optimization turned on (i.e. grad is recomputed each time), the results are then the same as well.
PS: A piggy-backed issue is that loss.crossEntropy
doesn't support batch mode due to util.logSumExp
not supporting it.
Thank you for your help.
t = require 'torch'
grad = require 'autograd'
t.manualSeed(11)
grad.optimize(true) --COMMENT ME OUT
local params = {
W = {
t.randn(50,50),
t.randn(50,10),
}
}
local ces = grad.nn.CrossEntropyCriterion()
local cnl = grad.nn.ClassNLLCriterion()
local lsm = grad.nn.LogSoftMax()
local f = function(params, x, y)
local h1 = t.tanh(x * params.W[1])
local h2 = t.tanh(h1 * params.W[2])
return ces(h2,y)
end
local g = function(params, x, y)
local h1 = t.tanh(x * params.W[1])
local h2 = t.tanh(h1 * params.W[2])
return cnl(lsm(h2),y)
end
local df = grad(f) --OR MOVE ME INTO THE LOOP
local dg = grad(g)
local inputs = torch.Tensor(100,50):normal(0,1)
local targets = torch.Tensor(100):fill(1)
for i=1,10 do
local graddF, lossdF = df(params, inputs, targets)
local graddG, lossdG = dg(params, inputs, targets)
print(lossdF, graddF.W[1]:norm(), graddF.W[2]:norm())
print(lossdG, graddG.W[1]:norm(), graddG.W[2]:norm())
params.W[1]:add(-1e-3, graddF.W[1])
params.W[2]:add(-1e-3, graddF.W[2])
end
That's pretty strange. I don't see any of the usual culprits.
cc @luketwitter
this bugged me again. CrossEntropyCriterion cannot be used with optimize = true, I remember myself trying to debug it without any luck a few months ago.
apparently codegen first calls backward and then forward starting from the second time it's called. First time is fine.