kozistr / pytorch_optimizer

optimizer & lr scheduler & loss function collections in PyTorch

Home Page:https://pytorch-optimizers.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sharpness Aware Minimization (SAM) requires closure

manza-ari opened this issue · comments

commented

Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure

Hello!

First of all, thanks for your interest to the repo!

you can find the usage at the docstring here!
closure function should be passed into the step() function.

if possible, please upload your code so that debug the codes more accurately :)

For now, there's lack of docs, but someday i'm gonna build a documentation to use easily (i can't sure when it's done).

if you have more questions, feel free to comment here

best regards

commented

Thank you for your reply, I have gone through this documentation but still, I am not getting how to fix it. However, the code is here

`if method =='lloss':
models = {'backbone': resnet18, 'module': loss_module}

        # Loss, criterion and scheduler (re)initialization
        criterion      = nn.CrossEntropyLoss(reduction='none')
        base_optimizer = torch.optim.SGD
        optim_backbone = SAM(models['backbone'].parameters(), base_optimizer, lr=LR, 
            momentum=MOMENTUM, weight_decay=WDECAY)
        sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
        optimizers = {'backbone': optim_backbone}
        schedulers = {'backbone': sched_backbone}
        `

I think the definition part (your code) is perfect, but the training part.

To use SAM optimizer, we should call the optimizer in the training process like below!

    # use this loss for any training statistics
    loss = criterion(output, model(input))
    loss.backward()
    optimizer.first_step(zero_grad=True)

    # second forward-backward pass
    # make sure to do a full forward pass
    criterion(output, model(input)).backward()
    optimizer.second_step(zero_grad=True)

% optimizer is equal to optimizers['backbone']
% model is equal to models['backbone']

commented

This is the training part

`def train(models, method, criterion, optimizers, schedulers, dataloaders, num_epochs, epoch_loss):
print('>> Train a Model.')
best_acc = 0.

for epoch in range(num_epochs):

    best_loss = torch.tensor([0.5]).cuda()
    loss = train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss)

    schedulers['backbone'].step()
    if method == 'lloss':
        schedulers['module'].step()

    if False and epoch % 20  == 7:
        acc = test(models, epoch, method, dataloaders, mode='test')
        # acc = test(models, dataloaders, mc, 'test')
        if best_acc < acc:
            best_acc = acc
            print('Val Acc: {:.3f} \t Best Acc: {:.3f}'.format(acc, best_acc))
print('>> Finished.')`

maybe in the train_epoch function, there're codes loss backward part (loss.backward())

commented

Thank you so much for your help. I wrote something like this

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()

    iters += 1

    optimizers['backbone'].zero_grad()
    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss

    #######        SAM Optimizer 
    
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(scores, method(input)).backward()              # I have error here 
    optimizers['backbone'].second_step(zero_grad=True)
    
    if method == 'lloss':
        optimizers['module'].step()

    return loss`

criterion(scores, method(input)).backward()

maybe, it should be changed to,

criterion(models['backbone'](inputs)[0], labels).backward()

similar scheme with criterion(scores, labels)

commented

`
criterion(models'backbone'[0], labels).backward()
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 166, in backward
grad_tensors
= make_grads(tensors, grad_tensors, is_grads_batched=False)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 67, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

`

maybe criterion in your code doesn't return scalar output(s).

I think the whole codes (below) are the criterion function.

    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss
    # `loss` here is the final loss.
commented

No, Actually this repo is using multiple methods such as Random or 'lloss'
I have removed the module of that method for sake of simplicity, now can you suggest me

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers['backbone'].zero_grad()

    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    
    m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
    loss            = m_backbone_loss

    # -----------------SAM Optimizer -------------------
    #loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    optimizers['backbone'].second_step(zero_grad=True)
    criterion(models['backbone'](inputs)[0], labels).backward()
    return loss`

I guess this should be worked,

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].first_step(zero_grad=True)

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].second_step(zero_grad=True)

if still there's an error, please check my test code! (below codes are runnable with no errors, and correct usage)

  1. https://github.com/kozistr/pytorch_optimizer/blob/main/tests/test_optimizers.py#L187
  2. https://github.com/kozistr/pytorch_optimizer/blob/main/tests/test_optimizers.py#L213
commented

Thank you so much for your help and recommendations. I cannot thank you enough.
I fixed the error by adding loss.backward()

`

-----------------SAM Optimizer -------------------

    loss.backward()
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].first_step(zero_grad=True)
  
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].second_step(zero_grad=True)

`
I hope I am using SAM correctly

maybe you should call loss.backward() twice

only calling criterion(models['backbone'](inputs)[0], labels) doesn't do backward(), just calculating loss

below code is the correct usage!

    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
  
    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].second_step(zero_grad=True)
commented

When I call loss.backward() twice it gives me following error

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

by following error, you can specify retain_graph=True, then maybe error will be gone

commented

None of them are working

`

-----------------SAM Optimizer -------------------

    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].second_step(zero_grad=True)

-----------------SAM Optimizer for LLOSS Method -------------------

    if method == 'lloss':
        #optimizers['module'].step()
        loss1 = criterion(models['backbone'](inputs)[0], labels)
        loss1.backward( )
        optimizers['module'].first_step(zero_grad=True)
        
        loss2 = criterion(models['backbone'](inputs)[0], labels)
        loss2.backward( )
        optimizers['module'].second_step(zero_grad=True)

        loss = torch.tensor([loss1, loss2])
        loss.backward(gradient=torch.tensor([1.0,1.0]))

`
Error Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

commented

As per the sample given here https://github.com/davda54/sam loss.backward() is not required for secont_step. the first one working fine for me but not working for my LLOSS method

actually it does! (do backward twice)

in the example code (got from https://github.com/davda54/sam), it does backward() twice. And, by the concept of SAM optimizer, forward-backward pass must be done twice!

  # first forward-backward pass
  loss = loss_function(output, model(input))  # use this loss for any training statistics
  loss.backward()
  optimizer.first_step(zero_grad=True)
  
  # second forward-backward pass
  loss_function(output, model(input)).backward()  # make sure to do a full forward pass
  # it is equal to
  # loss = loss_function(output, model(input))
  # loss.backward()
  optimizer.second_step(zero_grad=True)
commented

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it.
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it. raise RuntimeError("grad can be implicitly created only for scalar outputs") RuntimeError: grad can be implicitly created only for scalar outputs

grad can be implicitly created only for scalar outputs error means that loss is not a scalar, but vector. you need to check whether the loss is scalar.

it's depends on the output(s) of model & loss function of your codes. so take a look into that part!