xxlong0 / Wonder3D

Single Image to 3D using Cross-Domain Diffusion for 3D Generation

Home Page:https://www.xxlong.site/Wonder3D/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot reproduce the result.

recordmp3 opened this issue · comments

I appreciate your wonderful work! the demo is excellent!
However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result.
After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

30000-validation_train-gt

30000-validation_train-sample_cfg1 0

Our model is trained on 8 GPUs with a batch size 256 totally after accumulation. Using only one gpu will be infeasible.

oh sorry I tried 8gpu with bs=256 (wrongly written as 1gpu in the command but yeah I'm using 8gpus :) And that does not work.

I also tried overfitting to only one scene, only output normal, and it still does not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image) after 1k iters with batchsize = 256, without changing any of your code.

Hello, in our experiments, we also find that overfitting on one scene will be not feasible. How much data did you use for the training in this experiment ?

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

30000-validation_train-gt

30000-validation_train-sample_cfg1 0

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background.
30000-3-validation-sample_cfg1 0

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

I find that the stage1 training yaml has zero_init_camera_projection: true, which will cause different camera embeddings to be the same after a 2-layer MLP's projection. May I know whether this is on purpose?

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

30000-validation_train-gt

30000-validation_train-sample_cfg1 0

I also have the same problem after the training of stage 1. Did you fix the problem?
And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

Same problem, is there anything wrong in the code release?

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background. 30000-3-validation-sample_cfg1 0

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

The gray background is reasonable, which indicates your training doesn't converge. When your model converges well, the predicted image will have white background.

Fixed a severe training bug. The "zero_init_camera_projection" in 'configs/train/stage1-mix-6views-lvis.yaml' should be False. Otherwise, the domain control and pose control will be invalid in the training.

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)
the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!
30000-validation_train-gt
30000-validation_train-sample_cfg1 0

I also have the same problem after the training of stage 1. Did you fix the problem? And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

I have the same problem with loading checkpoint of stage 2. The missing weights are related to 'joint' blocks.

Have you resolved the issue? Did you change the bg_color to white in the config?

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background. 30000-3-validation-sample_cfg1 0

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

Did you finally change the bg_color to white in the config?

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)
the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!
30000-validation_train-gt
30000-validation_train-sample_cfg1 0

I also have the same problem after the training of stage 1. Did you fix the problem? And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

I have the same problem with loading checkpoint of stage 2. The missing weights are related to 'joint' blocks.

@aquarter147 @GliAmanti @xxlong0 Which checkpoint to load in stage 2, and how to solve the missing weights problem, may i ask did you solve this?
Have anyone successfully run stage 2 ? ...