[BUG] grad_scale_func is not working when using Float16OptimizerWithFloat16Params, causing slow loss drops in fp16
liaosnow opened this issue · comments
hi, find a problem that grad_scale_func is not working when using Float16OptimizerWithFloat16Params.
269 lines of code didn't work in Megatron-DeepSpeed/megatron/core/pipeline_parallel/schedules.py.
"output_tensor[0] = config.grad_scale_func(output_tensor[0])"
Here config.grad_scale_func is None.
1144 lines of code in Megatron-DeepSpeed/megatron/training.py.
"config.grad_scale_func = optimizer.scale_loss" works, but "config = get_model_config(model)" doesn't look like it's working.
Can you verify and fix this?
Same problem! Can you fix it? @liaosnow
Take “config“ as a function parameter, to replace "config = get_model_config(model)".
Above methods can work round.