I'm getting a blank graph
Shreeyak opened this issue · comments
I running a semantic segmentation model, Deeplabv3+ with a modified CrossEntropyLoss and either SGD or Adam optimizer.
When I run the LRFinder, I get a blank graph. No losses seen. Even though I printed the losses and the criterion is def returning valid values.
Sweeping across start_lr = 1e-07 and end_lr = 0.0001
0%| | 0/10 [00:00<?, ?it/s]
loss: tensor(89984., device='cuda:0', grad_fn=<DivBackward0>)
10%|███████████▍ | 1/10 [00:06<00:54, 6.01s/it]
loss: tensor(1588043.6250, device='cuda:0', grad_fn=<DivBackward0>)
20%|██████████████████████▊ | 2/10 [00:09<00:40, 5.12s/it]
loss: tensor(420687.0938, device='cuda:0', grad_fn=<DivBackward0>)
30%|██████████████████████████████████▏ | 3/10 [00:12<00:31, 4.50s/it]
loss: tensor(653955.4375, device='cuda:0', grad_fn=<DivBackward0>)
40%|█████████████████████████████████████████████▌ | 4/10 [00:15<00:24, 4.07s/it]
loss: tensor(141592.6875, device='cuda:0', grad_fn=<DivBackward0>)
50%|█████████████████████████████████████████████████████████ | 5/10 [00:18<00:18, 3.76s/it]
loss: tensor(97450.2891, device='cuda:0', grad_fn=<DivBackward0>)
60%|████████████████████████████████████████████████████████████████████▍ | 6/10 [00:21<00:14, 3.55s/it]
loss: tensor(160497.9375, device='cuda:0', grad_fn=<DivBackward0>)
70%|███████████████████████████████████████████████████████████████████████████████▊ | 7/10 [00:24<00:10, 3.44s/it]
loss: tensor(151121.3594, device='cuda:0', grad_fn=<DivBackward0>)
80%|███████████████████████████████████████████████████████████████████████████████████████████▏ | 8/10 [00:27<00:06, 3.38s/it]
loss: tensor(123211.6484, device='cuda:0', grad_fn=<DivBackward0>)
90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 9/10 [00:31<00:03, 3.40s/it]
loss: tensor(98576.7578, device='cuda:0', grad_fn=<DivBackward0>)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:34<00:00, 3.43s/it]
Learning rate search finished. See the graph with {finder_name}.plot()
Lemme know what other details I can attach.
My criterion:
def cross_entropy2d(logit, target, ignore_index=255, weight=None, batch_average=True):
"""
The loss is
.. math::
\sum_{i=1}^{\\infty} x_{i}
`(minibatch, C, d_1, d_2, ..., d_K)`
Args:
logit (Tensor): Output of network
target (Tensor): Ground Truth
ignore_index (int, optional): Defaults to 255. The pixels with this labels do not contribute to loss
weight (List, optional): Defaults to None. Weight assigned to each class
batch_average (bool, optional): Defaults to True. Whether to consider the loss of each element in the batch.
Returns:
Float: The value of loss.
"""
n, c, h, w = logit.shape
target = target.squeeze(1)
if weight is None:
criterion = nn.CrossEntropyLoss(weight=weight, ignore_index=ignore_index, reduction='sum')
else:
criterion = nn.CrossEntropyLoss(weight=torch.tensor(weight, dtype=torch.float32),
ignore_index=ignore_index,
reduction='sum')
loss = criterion(logit, target.long())
if batch_average:
loss /= n
return loss
I even tried running with the default CrossEntropyLoss that gives loss values <1. Still a blank graph:
def cross_entropy2d_lrfinder(logit, target, ignore_index=255, weight=None, batch_average=True):
criterion = nn.CrossEntropyLoss()
loss = criterion(logit, target.long())
print('loss: ', loss)
return loss
Sweeping across start_lr = 1e-07 and end_lr = 0.0001
0%| | 0/3 [00:00<?, ?it/s]
loss: tensor(0.7474, device='cuda:0', grad_fn=<NllLoss2DBackward>)
33%|██████████████████████████████████████▎ | 1/3 [00:05<00:11, 5.54s/it]
loss: tensor(0.7463, device='cuda:0', grad_fn=<NllLoss2DBackward>)
67%|████████████████████████████████████████████████████████████████████████████▋ | 2/3 [00:08<00:04, 4.79s/it]
loss: tensor(0.7460, device='cuda:0', grad_fn=<NllLoss2DBackward>)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:11<00:00, 3.88s/it]
Learning rate search finished. See the graph with {finder_name}.plot()
Hi @Shreeyak
That's quite weird, can you try to print out lr_finder.history
to see whether there is any value recorded?
Yes, thanks!
Here's the output of lr_finder.history
from a short run:
{'lr': [9.999999999999997e-06, 0.0001, 0.0009999999999999996], 'loss': [47049.02734375, 47058.836328125, 47008.379277343745]}
I'm using a conda env, with pytorch 1.5
torch 1.5.0 pypi_0 pypi
torch-lr-finder 0.1.5 pypi_0 pypi
torchvision 0.6.0 pypi_0 pypi
It seems losses are recorded properly. 🤔
Back to the original post, is the argument num_iter
in lr_finder.range_test()
not large enough to make it able to be plotted? Because there are 2 default arguments skip_start=10
and skip_end=5
in lr_finder.plot()
. You can try to set them both to 0 and re-plot again.
Oh, I probably figure it out.
As the reason mentioned in my previous comment, there are 2 default arguments skip_start=10
and skip_end=5
in lr_finder.plot()
, which means num_iter
used in lr_finder.range_test()
should be at least 15 (skip_start + skip_end
). Otherwise, there won't be any available values to be plotted after the history is trimmed.
And in the original post, it seems num_iter
is only 10 (speculated from the progress bar). So that's probably the reason why there is nothing plotted in the graph.
That's odd. Your suggestion worked (thanks!), but there are 2 issues:
-
I'm sure I got a blank graph when I ran initially with
num_iter=100
. Lemme run again and get back to you on that - I just reverted some changes that I'd made when reporting this error. -
I passed in a range start_lr=1e-7 and end_lr=1e-4, but the graph is printing loss values for 10e-5, 10e-4 and 10e-3 . Why not start with 10e-6?
-
How do I get the lr_finder to run multiple batches for each "iteration". I'd assumed that
num_iter
would control the number of batches/iterations for each value of lr in the given range.
Okay, glad it's resolved. 😊
Feel free to reopen this issue if the same error occurs again.