open-mmlab / mmflow

OpenMMLab optical flow toolbox and benchmark

Home Page:https://mmflow.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why does GaussianNoise in data enhancement cause gradient explosion and grad_norm value is large

pedroHuang123 opened this issue · comments

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

There are some questions we need to know:

  1. Which model are you training?
  2. Are you using the default config of MMFlow without any modifications?
  3. Can you provide the training log? Maybe we can debug together.

@Zachary-66 thanks
1.we want to fintune RAFT's mixed model on our datasets
2.It is our config:

optimizer = dict(

type='AdamW',

lr=0.000125,

betas=(0.9, 0.999),

eps=1e-08,

weight_decay=1e-05,

amsgrad=False)

# optimizer_config = dict(grad_clip=dict(max_norm=0.1))

lr_config = dict(

policy='OneCycle',

max_lr=0.000125,

total_steps=201000,

pct_start=0.05,

anneal_strategy='linear')

runner = dict(type='IterBasedRunner', max_iters=200000)

checkpoint_config = dict(by_epoch=False, interval=10000)

evaluation = dict(interval=10000, metric=['EPE','Fl'])

3.the training log is:
20221118_220707.log

image
After fintune 200k iteration, The EPE is larger than the original value.(mixed model) @Zachary-66

do not use gaussian noise