zju3dv / Vox-Fusion

Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The modification of optimizer

JunyuanDeng opened this issue · comments

I notice that you fixed an optimizer error.

Originally, all parameters are gathered in one tuple, and add them together into optimize Adam in the end.

    optimize_params = [{'params': embeddings, 'lr': learning_rate[0]}]
    if update_decoder:
        optimize_params += [{'params': sdf_network.parameters(),
                             'lr': learning_rate[0]}]

    for keyframe in keyframe_graph:
        if keyframe.stamp != 0 and update_pose:
            keyframe.pose.requires_grad_(True)
            optimize_params += [{
                'params': keyframe.pose.parameters(), 'lr': learning_rate[1]
            }]

    optim = torch.optim.Adam(optimize_params)

While for now, you initialize the optimizer Adam for each parameter separately. To my knowledge, there are no differences between them.

Could you tell me the reason to do that? I'm so curious about it.

I also have the same question

Hi, @JIANG-CX and @Rotatingpencil,
The Adam optimizer is stateful, initializing a new one for each optimization loop discards the accumulated information (such as momentum), which would lead to oscillations in the optimized parameters (such as pose). We observed a performance increase for the Replica dataset if we kept all the optimizers.

So optim.zero_grad() is not enough to remove the accumulated information?

optim.zero_grad() only empties the accumulated gradients from the optimized parameters. Please keep in mind that the gradient is only approximated for stochastic methods, modern optimizers keep more information to determine step length and step directions. I recommend you read the original Adam optimizer paper for more details.

Thanks for your reply! It helps me learn a lot!