The modification of optimizer

Question

The modification of optimizer

JunyuanDeng opened this issue a year ago · comments

I notice that you fixed an optimizer error.

Originally, all parameters are gathered in one tuple, and add them together into optimize Adam in the end.

    optimize_params = [{'params': embeddings, 'lr': learning_rate[0]}]
    if update_decoder:
        optimize_params += [{'params': sdf_network.parameters(),
                             'lr': learning_rate[0]}]

    for keyframe in keyframe_graph:
        if keyframe.stamp != 0 and update_pose:
            keyframe.pose.requires_grad_(True)
            optimize_params += [{
                'params': keyframe.pose.parameters(), 'lr': learning_rate[1]
            }]

    optim = torch.optim.Adam(optimize_params)

While for now, you initialize the optimizer Adam for each parameter separately. To my knowledge, there are no differences between them.

Could you tell me the reason to do that? I'm so curious about it.

JIANG Chenxing · Answer 1 · Sun Feb 12 2023 15:14:12 GMT+0800 (China Standard Time)

I also have the same question

Yang Xing Rui · Answer 2 · Sat Feb 18 2023 13:46:14 GMT+0800 (China Standard Time)

Hi, @JIANG-CX and @Rotatingpencil,
The Adam optimizer is stateful, initializing a new one for each optimization loop discards the accumulated information (such as momentum), which would lead to oscillations in the optimized parameters (such as pose). We observed a performance increase for the Replica dataset if we kept all the optimizers.

JUNYUAN DENG · Answer 3 · Sat Feb 18 2023 13:53:21 GMT+0800 (China Standard Time)

So optim.zero_grad() is not enough to remove the accumulated information?

Yang Xing Rui · Answer 4 · Sat Feb 18 2023 13:58:12 GMT+0800 (China Standard Time)

optim.zero_grad() only empties the accumulated gradients from the optimized parameters. Please keep in mind that the gradient is only approximated for stochastic methods, modern optimizers keep more information to determine step length and step directions. I recommend you read the original Adam optimizer paper for more details.

JUNYUAN DENG · Answer 5 · Sat Feb 18 2023 14:00:04 GMT+0800 (China Standard Time)

Thanks for your reply! It helps me learn a lot!