Model training

Question

Model training

rose-jinyang opened this issue 4 years ago · comments

Hello
How are you?
Thanks for contributing this project.
I am training a new model with config model_mobilenetv2_with_two_auxiliary_losses and AISegment dataset for 8 days.
But it is strange that the best model is not updated for 7 days.
What do u think the reason?

shivSD · Answer 1 · Tue Nov 17 2020 05:04:04 GMT+0800 (China Standard Time)

@rose-jinyang did you figure out the issue. Even we are having the same issue, Loss is not decreasing after awhile.

liaohan · Answer 2 · Fri Mar 12 2021 17:35:06 GMT+0800 (China Standard Time)

i also have this problem ,change the lr and train many epoch (1000 at least)

David-Hown · Answer 3 · Thu Aug 19 2021 15:28:35 GMT+0800 (China Standard Time)

Hello, could you please tell me that I made a mistake when retraining the model? Do you have any good solutions?
load model init finish...
===========> training <===========
Epoch: [0][0/23] Lr-deconv: [0.0] Lr-other: [0.001] Loss 13.0424 (13.0424)
Traceback (most recent call last):
File "train.py", line 809, in
main(args)
File "train.py", line 774, in main
train(dataLoader_train, netmodel, optimizer, epoch, logger_train, exp_args)
File "train.py", line 519, in train
logger.scalar_summary(tag, value, step=i)
File "/data/data_hao/PFLD-pytorch/20210817/PortraitNet/util/logger.py", line 26, in scalar_summary
tf.summary.scalar(tag, value, step=step)
File "/root/.local/lib/python3.6/site-packages/tensorboard/plugins/scalar/summary_v2.py", line 49, in scalar
with tf.summary.summary_scope(
AttributeError: module 'tensorboard.summary._tf.summary' has no attribute 'summary_scope'