Why does GaussianNoise in data enhancement cause gradient explosion and grad_norm value is large

Question

Why does GaussianNoise in data enhancement cause gradient explosion and grad_norm value is large

pedroHuang123 opened this issue 2 years ago · comments

pedroHuang123 commented 2 years ago

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

A placeholder for the command.

Did you make any modifications on the code or config? Did you understand what you have modified?
What dataset did you use?

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Zachary-66 · Answer 1 · Tue Nov 22 2022 15:11:08 GMT+0800 (China Standard Time)

There are some questions we need to know:

Which model are you training?
Are you using the default config of MMFlow without any modifications?
Can you provide the training log? Maybe we can debug together.

pedroHuang123 · Answer 2 · Tue Nov 22 2022 15:25:01 GMT+0800 (China Standard Time)

@Zachary-66 thanks
1.we want to fintune RAFT's mixed model on our datasets
2.It is our config:

optimizer = dict(

type='AdamW',

lr=0.000125,

betas=(0.9, 0.999),

eps=1e-08,

weight_decay=1e-05,

amsgrad=False)

# optimizer_config = dict(grad_clip=dict(max_norm=0.1))

lr_config = dict(

policy='OneCycle',

max_lr=0.000125,

total_steps=201000,

pct_start=0.05,

anneal_strategy='linear')

runner = dict(type='IterBasedRunner', max_iters=200000)

checkpoint_config = dict(by_epoch=False, interval=10000)

evaluation = dict(interval=10000, metric=['EPE','Fl'])

3.the training log is:
20221118_220707.log

pedroHuang123 · Answer 3 · Tue Nov 22 2022 15:27:53 GMT+0800 (China Standard Time)

After fintune 200k iteration, The EPE is larger than the original value.(mixed model) @Zachary-66

pedroHuang123 · Answer 4 · Fri Nov 25 2022 10:50:32 GMT+0800 (China Standard Time)

do not use gaussian noise