ashleve / lightning-hydra-template

PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cannot use "trainer=ddp"

a4152684 opened this issue · comments

commented

I can use "trainer=gpu" and it can work well
but when I change it to "trainer=ddp", then it can't work
Could you please help me?
Traceback (most recent call last):
File "/home/lcbryant/cz_nerf/p_nerf/src/utils/utils.py", line 38, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "/home/lcbryant/cz_nerf/p_nerf/train.py", line 78, in train
trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 582, in fit
call._call_and_handle_interrupt(
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 113, in launch
mp.start_processes(
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes
process.start()
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/lcbryant/anaconda3/envs/cz_nerf/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_embedder..'