loss value is specified as nan. How can I solve it?
FFudi opened this issue · comments
From the initial state of learning, the loss value is specified as nan. How can I solve it? The config is configured as follows (custom dataset_num = 29, 1290*1080 size, coverted coco format).
I tried the following but the result is still nan.
first, my config
MODEL:
TYPE: YOLOv3
BACKBONE: darknet53
ANCHORS: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
ANCH_MASK: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
N_CLASSES: 29
GAUSSIAN: True
TRAIN:
LR: 0.001
MOMENTUM: 0.9
DECAY: 0.0005
BURN_IN: 1000
MAXITER: 60200
STEPS: (48000, 54000)
BATCHSIZE: 4
SUBDIVISION: 2
IMGSIZE: 608
LOSSTYPE: l2
IGNORETHRE: 0.7
GRADIENT_CLIP: 2000.0
AUGMENTATION:
RANDRESIZE: True
JITTER: 0.3
RANDOM_PLACING: True
HUE: 0.1
SATURATION: 1.5
EXPOSURE: 1.5
LRFLIP: True
RANDOM_DISTORT: True
TEST:
CONFTHRE: 0.8
NMSTHRE: 0.45
IMGSIZE: 416
NUM_GPUS: 1
In addition, we changed the yololayer class.
def _gaussian_dist_pdf (self, val, mean, var):
simga_constant = 0.3
return torch.exp (-(val-mean) ** 2.0 / var / 2.0) / torch.sqrt (2.0 * np.pi * var) + simga_constant
I want to solve this problem.
Hi @FFudi, you should add sigma_const
to var
, not to loss itself.
def _gaussian_dist_pdf (self, val, mean, var):
simga_constant = 0.3
return torch.exp (-(val-mean) ** 2.0 / (var + sigma_const ) / 2.0) / torch.sqrt (2.0 * np.pi * (var + sigma_const ))