Infinite loss value when training under amp

Question

Infinite loss value when training under amp

jameslahm opened this issue 2 years ago · comments

Hi, I encounter the infinite loss value assertion failure when training using mixed precision.
The trackback like this:

Traceback (most recent call last):
  File "main.py", line 498, in <module>
    main(args)
  File "main.py", line 409, in main
    train_stats = train_one_epoch(
  File "ConvNeXt/engine.py", line 63, in train_one_epoch
    assert math.isfinite(loss_value)
AssertionError

I wonder how I could fix this problem. Thanks very much!

uristern123 · Answer 1 · Sun Apr 02 2023 22:20:30 GMT+0800 (China Standard Time)

Hi,
This happened to me as well, did you find a solution to this problem?