loss value is specified as nan. How can I solve it?

Question

loss value is specified as nan. How can I solve it?

FFudi opened this issue 4 years ago · comments

From the initial state of learning, the loss value is specified as nan. How can I solve it? The config is configured as follows (custom dataset_num = 29, 1290*1080 size, coverted coco format).

I tried the following but the result is still nan.
first, my config

MODEL:
TYPE: YOLOv3
BACKBONE: darknet53
ANCHORS: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
ANCH_MASK: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
N_CLASSES: 29
GAUSSIAN: True
TRAIN:
LR: 0.001
MOMENTUM: 0.9
DECAY: 0.0005
BURN_IN: 1000
MAXITER: 60200
STEPS: (48000, 54000)
BATCHSIZE: 4
SUBDIVISION: 2
IMGSIZE: 608
LOSSTYPE: l2
IGNORETHRE: 0.7
GRADIENT_CLIP: 2000.0
AUGMENTATION:
RANDRESIZE: True
JITTER: 0.3
RANDOM_PLACING: True
HUE: 0.1
SATURATION: 1.5
EXPOSURE: 1.5
LRFLIP: True
RANDOM_DISTORT: True
TEST:
CONFTHRE: 0.8
NMSTHRE: 0.45
IMGSIZE: 416
NUM_GPUS: 1

In addition, we changed the yololayer class.

     def _gaussian_dist_pdf (self, val, mean, var):
         simga_constant = 0.3
         return torch.exp (-(val-mean) ** 2.0 / var / 2.0) / torch.sqrt (2.0 * np.pi * var) + simga_constant

I want to solve this problem.

Motoki Kimura · Answer 1 · Wed Mar 11 2020 18:44:48 GMT+0800 (China Standard Time)

Hi @FFudi, you should add sigma_const to var, not to loss itself.

def _gaussian_dist_pdf (self, val, mean, var):
    simga_constant = 0.3
    return torch.exp (-(val-mean) ** 2.0 / (var + sigma_const ) / 2.0) / torch.sqrt (2.0 * np.pi * (var + sigma_const ))