Sharpness Aware Minimization (SAM) requires closure

Question

Sharpness Aware Minimization (SAM) requires closure

manza-ari opened this issue 2 years ago · comments

Kanza commented 2 years ago

Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure

Hyeongchan Kim · Answer 1 · Mon Jun 13 2022 15:25:51 GMT+0800 (China Standard Time)

Hello!

First of all, thanks for your interest to the repo!

you can find the usage at the docstring here!
closure function should be passed into the step() function.

if possible, please upload your code so that debug the codes more accurately :)

For now, there's lack of docs, but someday i'm gonna build a documentation to use easily (i can't sure when it's done).

if you have more questions, feel free to comment here

best regards

Kanza · Answer 2 · Mon Jun 13 2022 15:31:51 GMT+0800 (China Standard Time)

Thank you for your reply, I have gone through this documentation but still, I am not getting how to fix it. However, the code is here

`if method =='lloss':
models = {'backbone': resnet18, 'module': loss_module}

        # Loss, criterion and scheduler (re)initialization
        criterion      = nn.CrossEntropyLoss(reduction='none')
        base_optimizer = torch.optim.SGD
        optim_backbone = SAM(models['backbone'].parameters(), base_optimizer, lr=LR, 
            momentum=MOMENTUM, weight_decay=WDECAY)
        sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
        optimizers = {'backbone': optim_backbone}
        schedulers = {'backbone': sched_backbone}
        `

Hyeongchan Kim · Answer 3 · Mon Jun 13 2022 15:37:17 GMT+0800 (China Standard Time)

I think the definition part (your code) is perfect, but the training part.

To use SAM optimizer, we should call the optimizer in the training process like below!

    # use this loss for any training statistics
    loss = criterion(output, model(input))
    loss.backward()
    optimizer.first_step(zero_grad=True)

    # second forward-backward pass
    # make sure to do a full forward pass
    criterion(output, model(input)).backward()
    optimizer.second_step(zero_grad=True)

% optimizer is equal to optimizers['backbone']
% model is equal to models['backbone']

Kanza · Answer 4 · Mon Jun 13 2022 15:49:06 GMT+0800 (China Standard Time)

This is the training part

`def train(models, method, criterion, optimizers, schedulers, dataloaders, num_epochs, epoch_loss):
print('>> Train a Model.')
best_acc = 0.

for epoch in range(num_epochs):

    best_loss = torch.tensor([0.5]).cuda()
    loss = train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss)

    schedulers['backbone'].step()
    if method == 'lloss':
        schedulers['module'].step()

    if False and epoch % 20  == 7:
        acc = test(models, epoch, method, dataloaders, mode='test')
        # acc = test(models, dataloaders, mc, 'test')
        if best_acc < acc:
            best_acc = acc
            print('Val Acc: {:.3f} \t Best Acc: {:.3f}'.format(acc, best_acc))
print('>> Finished.')`

Hyeongchan Kim · Answer 5 · Mon Jun 13 2022 18:08:12 GMT+0800 (China Standard Time)

maybe in the train_epoch function, there're codes loss backward part (loss.backward())

Kanza · Answer 6 · Mon Jun 13 2022 18:37:59 GMT+0800 (China Standard Time)

Thank you so much for your help. I wrote something like this

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()

    iters += 1

    optimizers['backbone'].zero_grad()
    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss

    #######        SAM Optimizer 
    
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(scores, method(input)).backward()              # I have error here 
    optimizers['backbone'].second_step(zero_grad=True)
    
    if method == 'lloss':
        optimizers['module'].step()

    return loss`

Hyeongchan Kim · Answer 7 · Mon Jun 13 2022 18:41:57 GMT+0800 (China Standard Time)

criterion(scores, method(input)).backward()

maybe, it should be changed to,

criterion(models['backbone'](inputs)[0], labels).backward()

similar scheme with criterion(scores, labels)

Kanza · Answer 8 · Mon Jun 13 2022 18:44:50 GMT+0800 (China Standard Time)

`
criterion(models'backbone'[0], labels).backward()
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 166, in backward
grad_tensors = make_grads(tensors, grad_tensors, is_grads_batched=False)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 67, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

`

Hyeongchan Kim · Answer 9 · Mon Jun 13 2022 19:22:42 GMT+0800 (China Standard Time)

maybe criterion in your code doesn't return scalar output(s).

I think the whole codes (below) are the criterion function.

    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss
    # `loss` here is the final loss.

Kanza · Answer 10 · Mon Jun 13 2022 20:00:32 GMT+0800 (China Standard Time)

No, Actually this repo is using multiple methods such as Random or 'lloss'
I have removed the module of that method for sake of simplicity, now can you suggest me

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers['backbone'].zero_grad()

    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    
    m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
    loss            = m_backbone_loss

    # -----------------SAM Optimizer -------------------
    #loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    optimizers['backbone'].second_step(zero_grad=True)
    criterion(models['backbone'](inputs)[0], labels).backward()
    return loss`

Hyeongchan Kim · Answer 11 · Tue Jun 14 2022 21:37:39 GMT+0800 (China Standard Time)

I guess this should be worked,

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].first_step(zero_grad=True)

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].second_step(zero_grad=True)

if still there's an error, please check my test code! (below codes are runnable with no errors, and correct usage)

Kanza · Answer 12 · Wed Jun 15 2022 10:41:31 GMT+0800 (China Standard Time)

Thank you so much for your help and recommendations. I cannot thank you enough.
I fixed the error by adding loss.backward()

`

-----------------SAM Optimizer -------------------

    loss.backward()
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].first_step(zero_grad=True)
  
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].second_step(zero_grad=True)

`
I hope I am using SAM correctly

Hyeongchan Kim · Answer 13 · Wed Jun 15 2022 10:47:31 GMT+0800 (China Standard Time)

maybe you should call loss.backward() twice

only calling criterion(models['backbone'](inputs)[0], labels) doesn't do backward(), just calculating loss

below code is the correct usage!

    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
  
    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].second_step(zero_grad=True)

Kanza · Answer 14 · Wed Jun 15 2022 11:08:55 GMT+0800 (China Standard Time)

When I call loss.backward() twice it gives me following error

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Hyeongchan Kim · Answer 15 · Wed Jun 15 2022 12:53:28 GMT+0800 (China Standard Time)

by following error, you can specify retain_graph=True, then maybe error will be gone

Kanza · Answer 16 · Wed Jun 15 2022 13:26:01 GMT+0800 (China Standard Time)

None of them are working

`

-----------------SAM Optimizer -------------------

    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].second_step(zero_grad=True)

-----------------SAM Optimizer for LLOSS Method -------------------

    if method == 'lloss':
        #optimizers['module'].step()
        loss1 = criterion(models['backbone'](inputs)[0], labels)
        loss1.backward( )
        optimizers['module'].first_step(zero_grad=True)
        
        loss2 = criterion(models['backbone'](inputs)[0], labels)
        loss2.backward( )
        optimizers['module'].second_step(zero_grad=True)

        loss = torch.tensor([loss1, loss2])
        loss.backward(gradient=torch.tensor([1.0,1.0]))

`
Error Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Kanza · Answer 17 · Wed Jun 15 2022 14:23:07 GMT+0800 (China Standard Time)

As per the sample given here https://github.com/davda54/sam loss.backward() is not required for secont_step. the first one working fine for me but not working for my LLOSS method

Hyeongchan Kim · Answer 18 · Wed Jun 15 2022 18:46:19 GMT+0800 (China Standard Time)

actually it does! (do backward twice)

in the example code (got from https://github.com/davda54/sam), it does backward() twice. And, by the concept of SAM optimizer, forward-backward pass must be done twice!

  # first forward-backward pass
  loss = loss_function(output, model(input))  # use this loss for any training statistics
  loss.backward()
  optimizer.first_step(zero_grad=True)
  
  # second forward-backward pass
  loss_function(output, model(input)).backward()  # make sure to do a full forward pass
  # it is equal to
  # loss = loss_function(output, model(input))
  # loss.backward()
  optimizer.second_step(zero_grad=True)

Kanza · Answer 19 · Thu Jun 16 2022 11:53:43 GMT+0800 (China Standard Time)

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it.
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

Hyeongchan Kim · Answer 20 · Thu Jun 16 2022 18:53:29 GMT+0800 (China Standard Time)

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it. raise RuntimeError("grad can be implicitly created only for scalar outputs") RuntimeError: grad can be implicitly created only for scalar outputs

grad can be implicitly created only for scalar outputs error means that loss is not a scalar, but vector. you need to check whether the loss is scalar.

it's depends on the output(s) of model & loss function of your codes. so take a look into that part!

Hyeongchan Kim · Answer 21 · Sun Aug 21 2022 18:19:16 GMT+0800 (China Standard Time)

#66 (comment)