davidtvs / pytorch-lr-finder

A learning rate range test implementation in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I'm getting a blank graph

Shreeyak opened this issue · comments

I running a semantic segmentation model, Deeplabv3+ with a modified CrossEntropyLoss and either SGD or Adam optimizer.
When I run the LRFinder, I get a blank graph. No losses seen. Even though I printed the losses and the criterion is def returning valid values.

Sweeping across start_lr = 1e-07 and end_lr = 0.0001
  0%|                                                                                                                          | 0/10 [00:00<?, ?it/s]
loss:  tensor(89984., device='cuda:0', grad_fn=<DivBackward0>)
 10%|███████████▍                                                                                                      | 1/10 [00:06<00:54,  6.01s/it]
loss:  tensor(1588043.6250, device='cuda:0', grad_fn=<DivBackward0>)
 20%|██████████████████████▊                                                                                           | 2/10 [00:09<00:40,  5.12s/it]
loss:  tensor(420687.0938, device='cuda:0', grad_fn=<DivBackward0>)
 30%|██████████████████████████████████▏                                                                               | 3/10 [00:12<00:31,  4.50s/it]
loss:  tensor(653955.4375, device='cuda:0', grad_fn=<DivBackward0>)
 40%|█████████████████████████████████████████████▌                                                                    | 4/10 [00:15<00:24,  4.07s/it]
loss:  tensor(141592.6875, device='cuda:0', grad_fn=<DivBackward0>)
 50%|█████████████████████████████████████████████████████████                                                         | 5/10 [00:18<00:18,  3.76s/it]
loss:  tensor(97450.2891, device='cuda:0', grad_fn=<DivBackward0>)
 60%|████████████████████████████████████████████████████████████████████▍                                             | 6/10 [00:21<00:14,  3.55s/it]
loss:  tensor(160497.9375, device='cuda:0', grad_fn=<DivBackward0>)
 70%|███████████████████████████████████████████████████████████████████████████████▊                                  | 7/10 [00:24<00:10,  3.44s/it]
loss:  tensor(151121.3594, device='cuda:0', grad_fn=<DivBackward0>)
 80%|███████████████████████████████████████████████████████████████████████████████████████████▏                      | 8/10 [00:27<00:06,  3.38s/it]
loss:  tensor(123211.6484, device='cuda:0', grad_fn=<DivBackward0>)
 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▌           | 9/10 [00:31<00:03,  3.40s/it]
loss:  tensor(98576.7578, device='cuda:0', grad_fn=<DivBackward0>)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:34<00:00,  3.43s/it]
Learning rate search finished. See the graph with {finder_name}.plot()

Lemme know what other details I can attach.

My criterion:

def cross_entropy2d(logit, target, ignore_index=255, weight=None, batch_average=True):
    """
    The loss is

    .. math::
        \sum_{i=1}^{\\infty} x_{i}

        `(minibatch, C, d_1, d_2, ..., d_K)`

    Args:
        logit (Tensor): Output of network
        target (Tensor): Ground Truth
        ignore_index (int, optional): Defaults to 255. The pixels with this labels do not contribute to loss
        weight (List, optional): Defaults to None. Weight assigned to each class
        batch_average (bool, optional): Defaults to True. Whether to consider the loss of each element in the batch.

    Returns:
        Float: The value of loss.
    """

    n, c, h, w = logit.shape
    target = target.squeeze(1)

    if weight is None:
        criterion = nn.CrossEntropyLoss(weight=weight, ignore_index=ignore_index, reduction='sum')
    else:
        criterion = nn.CrossEntropyLoss(weight=torch.tensor(weight, dtype=torch.float32),
                                        ignore_index=ignore_index,
                                        reduction='sum')

    loss = criterion(logit, target.long())

    if batch_average:
        loss /= n

    return loss

I even tried running with the default CrossEntropyLoss that gives loss values <1. Still a blank graph:

def cross_entropy2d_lrfinder(logit, target, ignore_index=255, weight=None, batch_average=True):
    criterion = nn.CrossEntropyLoss()
    loss = criterion(logit, target.long())

    print('loss: ', loss)
    return loss
Sweeping across start_lr = 1e-07 and end_lr = 0.0001
  0%|                                                                                                                           | 0/3 [00:00<?, ?it/s]
loss:  tensor(0.7474, device='cuda:0', grad_fn=<NllLoss2DBackward>)
 33%|██████████████████████████████████████▎                                                                            | 1/3 [00:05<00:11,  5.54s/it]
loss:  tensor(0.7463, device='cuda:0', grad_fn=<NllLoss2DBackward>)
 67%|████████████████████████████████████████████████████████████████████████████▋                                      | 2/3 [00:08<00:04,  4.79s/it]
loss:  tensor(0.7460, device='cuda:0', grad_fn=<NllLoss2DBackward>)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:11<00:00,  3.88s/it]
Learning rate search finished. See the graph with {finder_name}.plot()

Hi @Shreeyak

That's quite weird, can you try to print out lr_finder.history to see whether there is any value recorded?

Yes, thanks!
Here's the output of lr_finder.history from a short run:

{'lr': [9.999999999999997e-06, 0.0001, 0.0009999999999999996], 'loss': [47049.02734375, 47058.836328125, 47008.379277343745]}

I'm using a conda env, with pytorch 1.5

torch                     1.5.0                    pypi_0    pypi
torch-lr-finder           0.1.5                    pypi_0    pypi
torchvision               0.6.0                    pypi_0    pypi

It seems losses are recorded properly. 🤔

Back to the original post, is the argument num_iter in lr_finder.range_test() not large enough to make it able to be plotted? Because there are 2 default arguments skip_start=10 and skip_end=5 in lr_finder.plot(). You can try to set them both to 0 and re-plot again.

Oh, I probably figure it out.

As the reason mentioned in my previous comment, there are 2 default arguments skip_start=10 and skip_end=5 in lr_finder.plot(), which means num_iter used in lr_finder.range_test() should be at least 15 (skip_start + skip_end). Otherwise, there won't be any available values to be plotted after the history is trimmed.
And in the original post, it seems num_iter is only 10 (speculated from the progress bar). So that's probably the reason why there is nothing plotted in the graph.

That's odd. Your suggestion worked (thanks!), but there are 2 issues:

  1. I'm sure I got a blank graph when I ran initially with num_iter=100. Lemme run again and get back to you on that - I just reverted some changes that I'd made when reporting this error.

  2. I passed in a range start_lr=1e-7 and end_lr=1e-4, but the graph is printing loss values for 10e-5, 10e-4 and 10e-3 . Why not start with 10e-6?
    Figure_2

  3. How do I get the lr_finder to run multiple batches for each "iteration". I'd assumed that num_iter would control the number of batches/iterations for each value of lr in the given range.

Uh, now passing num_iter=100 is giving me a graph, with and without passing skip_start and skip_end to lr_finder.plot(). Dunno, maybe I'd messed something up in my prev run. Here's fig for both runs:
Figure_3
Figure_4

Thanks for the fast response and resolution.

Okay, glad it's resolved. 😊
Feel free to reopen this issue if the same error occurs again.