Sharpness Aware Minimization (SAM) requires closure
manza-ari opened this issue · comments
Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?
RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure
Hello!
First of all, thanks for your interest to the repo!
you can find the usage at the docstring here!
closure
function should be passed into the step()
function.
if possible, please upload your code so that debug the codes more accurately :)
For now, there's lack of docs, but someday i'm gonna build a documentation to use easily (i can't sure when it's done).
if you have more questions, feel free to comment here
best regards
Thank you for your reply, I have gone through this documentation but still, I am not getting how to fix it. However, the code is here
`if method =='lloss':
models = {'backbone': resnet18, 'module': loss_module}
# Loss, criterion and scheduler (re)initialization
criterion = nn.CrossEntropyLoss(reduction='none')
base_optimizer = torch.optim.SGD
optim_backbone = SAM(models['backbone'].parameters(), base_optimizer, lr=LR,
momentum=MOMENTUM, weight_decay=WDECAY)
sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
optimizers = {'backbone': optim_backbone}
schedulers = {'backbone': sched_backbone}
`
I think the definition part (your code) is perfect, but the training part.
To use SAM
optimizer, we should call the optimizer in the training process like below!
# use this loss for any training statistics
loss = criterion(output, model(input))
loss.backward()
optimizer.first_step(zero_grad=True)
# second forward-backward pass
# make sure to do a full forward pass
criterion(output, model(input)).backward()
optimizer.second_step(zero_grad=True)
% optimizer
is equal to optimizers['backbone']
% model
is equal to models['backbone']
This is the training part
`def train(models, method, criterion, optimizers, schedulers, dataloaders, num_epochs, epoch_loss):
print('>> Train a Model.')
best_acc = 0.
for epoch in range(num_epochs):
best_loss = torch.tensor([0.5]).cuda()
loss = train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss)
schedulers['backbone'].step()
if method == 'lloss':
schedulers['module'].step()
if False and epoch % 20 == 7:
acc = test(models, epoch, method, dataloaders, mode='test')
# acc = test(models, dataloaders, mc, 'test')
if best_acc < acc:
best_acc = acc
print('Val Acc: {:.3f} \t Best Acc: {:.3f}'.format(acc, best_acc))
print('>> Finished.')`
maybe in the train_epoch
function, there're codes loss backward part (loss.backward()
)
Thank you so much for your help. I wrote something like this
`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers['backbone'].zero_grad()
if method == 'lloss':
optimizers['module'].zero_grad()
scores, _, features = models['backbone'](inputs)
target_loss = criterion(scores, labels)
if method == 'lloss':
if epoch > epoch_loss:
features[0] = features[0].detach()
features[1] = features[1].detach()
features[2] = features[2].detach()
features[3] = features[3].detach()
pred_loss = models['module'](features)
pred_loss = pred_loss.view(pred_loss.size(0))
m_module_loss = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
loss = m_backbone_loss + WEIGHT * m_module_loss
else:
m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
loss = m_backbone_loss
####### SAM Optimizer
loss.backward()
optimizers['backbone'].first_step(zero_grad=True)
criterion(scores, method(input)).backward() # I have error here
optimizers['backbone'].second_step(zero_grad=True)
if method == 'lloss':
optimizers['module'].step()
return loss`
criterion(scores, method(input)).backward()
maybe, it should be changed to,
criterion(models['backbone'](inputs)[0], labels).backward()
similar scheme with criterion(scores, labels)
`
criterion(models'backbone'[0], labels).backward()
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 166, in backward
grad_tensors = make_grads(tensors, grad_tensors, is_grads_batched=False)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 67, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
`
maybe criterion
in your code doesn't return scalar output(s).
I think the whole codes (below) are the criterion
function.
target_loss = criterion(scores, labels)
if method == 'lloss':
if epoch > epoch_loss:
features[0] = features[0].detach()
features[1] = features[1].detach()
features[2] = features[2].detach()
features[3] = features[3].detach()
pred_loss = models['module'](features)
pred_loss = pred_loss.view(pred_loss.size(0))
m_module_loss = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
loss = m_backbone_loss + WEIGHT * m_module_loss
else:
m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
loss = m_backbone_loss
# `loss` here is the final loss.
No, Actually this repo is using multiple methods such as Random or 'lloss'
I have removed the module of that method for sake of simplicity, now can you suggest me
`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers['backbone'].zero_grad()
if method == 'lloss':
optimizers['module'].zero_grad()
scores, _, features = models['backbone'](inputs)
target_loss = criterion(scores, labels)
m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
loss = m_backbone_loss
# -----------------SAM Optimizer -------------------
#loss = criterion(models['backbone'](inputs)[0], labels)
loss.backward()
optimizers['backbone'].first_step(zero_grad=True)
optimizers['backbone'].second_step(zero_grad=True)
criterion(models['backbone'](inputs)[0], labels).backward()
return loss`
I guess this should be worked,
criterion(models['backbone'](inputs)[0], labels).backward()
optimizers['backbone'].first_step(zero_grad=True)
criterion(models['backbone'](inputs)[0], labels).backward()
optimizers['backbone'].second_step(zero_grad=True)
if still there's an error, please check my test code! (below codes are runnable with no errors, and correct usage)
Thank you so much for your help and recommendations. I cannot thank you enough.
I fixed the error by adding loss.backward()
`
-----------------SAM Optimizer -------------------
loss.backward()
criterion(models['backbone'](inputs)[0], labels)
optimizers['backbone'].first_step(zero_grad=True)
criterion(models['backbone'](inputs)[0], labels)
optimizers['backbone'].second_step(zero_grad=True)
`
I hope I am using SAM correctly
maybe you should call loss.backward()
twice
only calling criterion(models['backbone'](inputs)[0], labels)
doesn't do backward(), just calculating loss
below code is the correct usage!
loss = criterion(models['backbone'](inputs)[0], labels)
loss.backward()
optimizers['backbone'].first_step(zero_grad=True)
loss = criterion(models['backbone'](inputs)[0], labels)
loss.backward()
optimizers['backbone'].second_step(zero_grad=True)
When I call loss.backward() twice it gives me following error
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
by following error, you can specify retain_graph=True
, then maybe error will be gone
None of them are working
`
-----------------SAM Optimizer -------------------
criterion(models['backbone'](inputs)[0], labels)
loss.backward(retain_graph=True)
optimizers['backbone'].first_step(zero_grad=True)
criterion(models['backbone'](inputs)[0], labels)
loss.backward(retain_graph=True)
optimizers['backbone'].second_step(zero_grad=True)
-----------------SAM Optimizer for LLOSS Method -------------------
if method == 'lloss':
#optimizers['module'].step()
loss1 = criterion(models['backbone'](inputs)[0], labels)
loss1.backward( )
optimizers['module'].first_step(zero_grad=True)
loss2 = criterion(models['backbone'](inputs)[0], labels)
loss2.backward( )
optimizers['module'].second_step(zero_grad=True)
loss = torch.tensor([loss1, loss2])
loss.backward(gradient=torch.tensor([1.0,1.0]))
`
Error Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
As per the sample given here https://github.com/davda54/sam loss.backward() is not required for secont_step. the first one working fine for me but not working for my LLOSS method
actually it does! (do backward twice)
in the example code (got from https://github.com/davda54/sam), it does backward() twice. And, by the concept of SAM
optimizer, forward-backward pass must be done twice!
# first forward-backward pass
loss = loss_function(output, model(input)) # use this loss for any training statistics
loss.backward()
optimizer.first_step(zero_grad=True)
# second forward-backward pass
loss_function(output, model(input)).backward() # make sure to do a full forward pass
# it is equal to
# loss = loss_function(output, model(input))
# loss.backward()
optimizer.second_step(zero_grad=True)
Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it.
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it. raise RuntimeError("grad can be implicitly created only for scalar outputs") RuntimeError: grad can be implicitly created only for scalar outputs
grad can be implicitly created only for scalar outputs
error means that loss
is not a scalar, but vector. you need to check whether the loss
is scalar.
it's depends on the output(s) of model & loss function of your codes. so take a look into that part!