Training my own dataset，image size is about 7000*368，during my training，the loss is very high and has been ongoing，Do you know what the reason is？

Question

Training my own dataset，image size is about 7000*368，during my training，the loss is very high and has been ongoing，Do you know what the reason is？

Easonshow opened this issue a year ago · comments

The train log is below:
Reducer buckets have been rebuilt in this iteration.
iter: 100/180000, lr: 0.003454, eta: 5:22:45, time: 10.87, loss: 132873553.8683, loss_pre: 132873553.8683, loss_aux0: 0.0000, loss_aux1: 0.0000, loss_aux2: 0.0000, loss_aux3: 0.0000
iter: 200/180000, lr: 0.004348, eta: 5:12:08, time: 10.06, loss: 46347.5991, loss_pre: 46347.5991, loss_aux0: 0.0000, loss_aux1: 0.0000, loss_aux2: 0.0000, loss_aux3: 0.0000
iter: 300/180000, lr: 0.005474, eta: 5:12:48, time: 10.50, loss: 307194.5442, loss_pre: 307194.5442, loss_aux0: 0.0000, loss_aux1: 0.0000, loss_aux2: 0.0000, loss_aux3: 0.0000
iter: 400/180000, lr: 0.006892, eta: 5:08:07, time: 9.84, loss: 20168.8101, loss_pre: 20168.8101, loss_aux0: 0.0000, loss_aux1: 0.0000, loss_aux2: 0.0000, loss_aux3: 0.0000
iter: 500/180000, lr: 0.008676, eta: 5:08:08, time: 10.32, loss: 148601438.1153, loss_pre: 148601438.1153, loss_aux0: 0.0000, loss_aux1: 0.0000, loss_aux2: 0.0000, loss_aux3: 0.0000

Easonshow · Answer 1 · Wed May 17 2023 14:08:22 GMT+0800 (China Standard Time)

bisenetv2

cfg = dict(
model_type='bisenetv2',
n_cats=1,
num_aux_heads=4,
lr_start=5e-3,
weight_decay=1e-4,
warmup_iters=1000,
max_iter=180000,
dataset='CocoStuff',
im_root='/home/sp/ahs_online_project/BiSeNet-master/datasets/TXLF',
train_im_anns='/home/sp/ahs_online_project/BiSeNet-master/datasets/TXLF/train.txt',
val_im_anns='/home/sp/ahs_online_project/BiSeNet-master/datasets/TXLF/val.txt',
scales=[0.25, 2.],
cropsize=[128, 128],
eval_crop=[128, 128],
eval_scales=[0.5, 0.75, 1, 1.25, 1.5, 1.75],
ims_per_gpu=8,
eval_ims_per_gpu=1,
use_fp16=True,
use_sync_bn=False,
respth='./res',
)
The above are the parameters of my configuration file

CoinCheung · Answer 2 · Thu Jun 08 2023 15:22:14 GMT+0800 (China Standard Time)

n_cats=1, why do you set it like this?