princeton-vl / RAFT-3D

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

some questions about training parameters

jsczzzk opened this issue · comments

Thank you for your excellent work and code!
I have some questions about training parameters:
1. The weight decay of the RAFT fine-tuned on Things3D is 1e-4, but in this project it is 1e-5. Can you give some advice on the choose of weight decay?
2 The code uses the adam optimizer instead of the adamW optimizer in the paper. In theory, adamW overcomes the shortcomings of adam and the effect should be better than adam. In the training process, will there be a big difference between the results optimized by adam and adamW?
3 There is no gradient explosion during the training process, so will the use of gradient clipping operation affect the accuracy?

Looking forward to your response, thanks in advance!

Hi @zachteed, I have some similar questions about the configurations that are not consistent with the paper. I really appreciate it if you can clarify. Thanks :D!

  1. In the paper, the crop size is 320x720, while in the code is 368x768 when training on Scene Flow.
  2. In the paper, the LR is 1e-4, while in the code is 2e-4 when training on Scene Flow.
  3. What about number of gpus? Can you clarify 200000 iterations is on one GPU or multi-gpus? This will make the total number of training steps different.
  4. The Adam vs AdamW optimizer mentioned by @jsczzzk

Hi, there have been few minor updates to the code. I reimplemented the least-squares solver to increase stability and decrease memory use, so it is now possible to train with a larger learning rate and image size. I reran the full training pipeline and got 5.10% SF error on KITTI vs 5.77% reported in the paper. Adam vs. AdamW made no difference in my experience. In all the experiments I used 1 GPU.

Thanks @zachteed, I really appreciate it! Safe to close this issue if @jsczzzk is ok with this.

Thank you for your patient answer :)