Cannot reproduce the result.

Question

Cannot reproduce the result.

recordmp3 opened this issue 5 months ago · comments

I appreciate your wonderful work! the demo is excellent!
However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result.
After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

xxlong0 · Answer 1 · Fri Feb 23 2024 21:40:20 GMT+0800 (China Standard Time)

Our model is trained on 8 GPUs with a batch size 256 totally after accumulation. Using only one gpu will be infeasible.

Zhenggang Tang · Answer 2 · Sat Feb 24 2024 02:31:49 GMT+0800 (China Standard Time)

oh sorry I tried 8gpu with bs=256 (wrongly written as 1gpu in the command but yeah I'm using 8gpus :) And that does not work.

Zhenggang Tang · Answer 3 · Sat Feb 24 2024 02:33:28 GMT+0800 (China Standard Time)

I also tried overfitting to only one scene, only output normal, and it still does not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image) after 1k iters with batchsize = 256, without changing any of your code.

Xiaoxiao LONG · Answer 4 · Mon Feb 26 2024 09:09:39 GMT+0800 (China Standard Time)

Hello, in our experiments, we also find that overfitting on one scene will be not feasible. How much data did you use for the training in this experiment ?

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

Zhenggang Tang · Answer 5 · Tue Feb 27 2024 03:22:02 GMT+0800 (China Standard Time)

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background.

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

Zhenggang Tang · Answer 6 · Fri Mar 01 2024 08:52:12 GMT+0800 (China Standard Time)

I find that the stage1 training yaml has zero_init_camera_projection: true, which will cause different camera embeddings to be the same after a 2-layer MLP's projection. May I know whether this is on purpose?

GliAmanti · Answer 7 · Wed Mar 13 2024 19:37:11 GMT+0800 (China Standard Time)

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

I also have the same problem after the training of stage 1. Did you fix the problem?
And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

Chenming Wu · Answer 8 · Thu Mar 14 2024 10:53:21 GMT+0800 (China Standard Time)

Same problem, is there anything wrong in the code release?

xxlong0 · Answer 9 · Mon Apr 01 2024 20:52:53 GMT+0800 (China Standard Time)

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background.

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

The gray background is reasonable, which indicates your training doesn't converge. When your model converges well, the predicted image will have white background.

xxlong0 · Answer 10 · Thu Apr 18 2024 15:51:32 GMT+0800 (China Standard Time)

Fixed a severe training bug. The "zero_init_camera_projection" in 'configs/train/stage1-mix-6views-lvis.yaml' should be False. Otherwise, the domain control and pose control will be invalid in the training.

Thảo Anh · Answer 11 · Mon May 13 2024 10:25:33 GMT+0800 (China Standard Time)

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)
the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

I also have the same problem after the training of stage 1. Did you fix the problem? And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

I have the same problem with loading checkpoint of stage 2. The missing weights are related to 'joint' blocks.

wangshuo · Answer 12 · Mon May 13 2024 14:45:31 GMT+0800 (China Standard Time)

Have you resolved the issue? Did you change the bg_color to white in the config?

wangshuo · Answer 13 · Mon May 13 2024 14:47:54 GMT+0800 (China Standard Time)

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background.

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

Did you finally change the bg_color to white in the config?

chenpei · Answer 14 · Wed Jun 26 2024 14:54:29 GMT+0800 (China Standard Time)

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)
the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

I also have the same problem after the training of stage 1. Did you fix the problem? And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

I have the same problem with loading checkpoint of stage 2. The missing weights are related to 'joint' blocks.

@aquarter147 @GliAmanti @xxlong0 Which checkpoint to load in stage 2, and how to solve the missing weights problem, may i ask did you solve this?
Have anyone successfully run stage 2 ? ...