DeNA / PyTorch_YOLOv3

Implementation of YOLOv3 in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I have a question about resuming training

deeppower opened this issue · comments

Thanks for your great work.
I want to resume training from snapshot.ckpt. So, I loaded checkpoint and changed the start of iteration.
But, the following error occurred.
line 190, in forward loss_xy = bceloss(output[..., :2], target[..., :2]) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 504, in forward return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction) File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2027, in binary_cross_entropy input, target, weight, reduction_enum) RuntimeError: reduce failed to synchronize: device-side assert triggered

Thank you for using our repo!

Did you change anything from the previous settings?
If yes, I recommend resuming with lr burn-in.
If no, I think it should work... please try it several times and see if the error persists.

@hirotomusiker
Thanks! Resuming with lr burn-in is effective.