Why does GaussianNoise in data enhancement cause gradient explosion and grad_norm value is large
pedroHuang123 opened this issue · comments
Thanks for your error report and we appreciate it a lot.
Checklist
- I have searched related issues but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version.
Describe the bug
A clear and concise description of what the bug is.
Reproduction
- What command or script did you run?
A placeholder for the command.
- Did you make any modifications on the code or config? Did you understand what you have modified?
- What dataset did you use?
Environment
- Please run
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here. - You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as
$PATH
,$LD_LIBRARY_PATH
,$PYTHONPATH
, etc.)
Error traceback
If applicable, paste the error trackback here.
A placeholder for trackback.
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
There are some questions we need to know:
- Which model are you training?
- Are you using the default config of MMFlow without any modifications?
- Can you provide the training log? Maybe we can debug together.
@Zachary-66 thanks
1.we want to fintune RAFT's mixed model on our datasets
2.It is our config:
optimizer = dict(
type='AdamW',
lr=0.000125,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=1e-05,
amsgrad=False)
# optimizer_config = dict(grad_clip=dict(max_norm=0.1))
lr_config = dict(
policy='OneCycle',
max_lr=0.000125,
total_steps=201000,
pct_start=0.05,
anneal_strategy='linear')
runner = dict(type='IterBasedRunner', max_iters=200000)
checkpoint_config = dict(by_epoch=False, interval=10000)
evaluation = dict(interval=10000, metric=['EPE','Fl'])
3.the training log is:
20221118_220707.log
After fintune 200k iteration, The EPE is larger than the original value.(mixed model) @Zachary-66
do not use gaussian noise