RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

Question

RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

boltron1 opened this issue 7 months ago · comments

ran command for stage2:
python launch.py --config custom/threestudio-dreamcraft3D/configs/dreamcraft3d-geometry.yaml --train system.prompt_processor.prompt="a cartoon boy king in robotic knight armor" data.image_path="./load/images/rey_rgba.png" system.geometry_convert_from="./outputs/dreamcraft3d-coarse-n
eus/a_cartoon_boy_king_in_robotic_knight_armor@20240302-113207/ckpts/last.ckpt"

i get this error and dont understand why . any help resolving this ?

Traceback (most recent call last):
File "/home/boltron/threestudio/launch.py", line 309, in
main(args, extras)
File "/home/boltron/threestudio/launch.py", line 252, in main
trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume)
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 543, in fit
call._call_and_handle_interrupt(
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in _run
self.strategy.setup(self)
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 171, in setup
self.configure_ddp()
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 283, in configure_ddp
self.model = self._setup_model(self.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 195, in _setup_model
return DistributedDataParallel(module=model, device_ids=device_ids, **self._ddp_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 678, in init
self._log_and_throw(
File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1037, in _log_and_throw
raise err_type(err_msg)
RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

boltron1 · Answer 1 · Tue Mar 05 2024 00:45:45 GMT+0800 (China Standard Time)

seems the issue was something in the threestudio extension version. i installed in its own env from this repo and it was able to get stage 2 to train.