Model train execution fails with a PyTorch error message.
shivamsnaik opened this issue · comments
Hello,
I would like to ask if anyone is facing the below issue:
TypeError: add(): argument 'alpha' must be Number, not NoneType
The steps I followed are:
python -m pip install -e DynamicHead
.- Added custom dataset using
register_coco_instance
- Update config in
def setup(args)
with custom dataset name and the number of classes:
cfg.DATASETS.TRAIN = ('coco_docbank_train',) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 13
- Run the model:
python train_net.py --config configs/dyhead_r50_rcnn_fpn_1x.yaml --num-gpus 1
.
The detailed error message goes like this:
File "train_net.py", line 204, in <module>
launch(
File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 198, in main
return trainer.train()
File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 294, in run_step
self.optimizer.step()
File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/optim/sgd.py", line 136, in step
F.sgd(params_with_grad,
File "/opt/conda/lib/python3.8/site-packages/torch/optim/_functional.py", line 164, in sgd
d_p = d_p.add(param, alpha=weight_decay)
TypeError: add(): argument 'alpha' must be Number, not NoneType
Environment Details:
sys.platform = linux
Python = 3.8.12
numpy = 1.21.2
detectron2 = 0.6
CUDA = 11.4
PyTorch = 1.10.0
torchvision = 0.11.0a0
fvcore = 0.1.5.post20211023
iopath = 0.1.9
cv2 = 4.5.4
Kindly request for assistance if you are aware of the solution.
Hi,I have the same problem as you. Have you solved it?
Hi,I have the same problem as you. Have you solved it?
i have the same problem, have you solved it?
@Houseqin @MajorityRreport Hi. Yes I did solve the issue.
The weight decay is not passed to Pytorch due to which the above error occurs.
Add the following lines to your config YAML file to solve this issue:
cfg.SOLVER.WEIGHT_DECAY_BIAS = <desired_value>
This was causing the issue in my case. Also, if cfg.SOLVER.WEIGHT_DECAY
is missing in your config, do include that.
It would surely solve the issue.
@Houseqin @MajorityRreport Hi. Yes I did solve the issue.
The weight decay is not passed to Pytorch due to which the above error occurs. Add the following lines to your config YAML file to solve this issue:
cfg.SOLVER.WEIGHT_DECAY_BIAS = <desired_value>
This was causing the issue in my case. Also, if
cfg.SOLVER.WEIGHT_DECAY
is missing in your config, do include that.It would surely solve the issue.
Thank you very much!!
@MajorityRreport Glad to help. Let me know if it still throws the same error.
@MajorityRreport Glad to help. Let me know if it still throws the same error.
it also has the same problem with the official config YAML dyhead_swint_atss_fpn_2x_ms.yaml
,although with adding cfg.SOLVER.WEIGHT_DECAY_BIAS
and cfg.SOLVER.WEIGHT_DECAY
@Houseqin I have never seen this error before. Is it hindering with normal operation?.
If yes, you can try replicating my versions for CUDA, Pytorch, etc that Ive mentioned in the issue description.
It works perfectly for me after the mentioned changes.
@Houseqin I have never seen this error before. Is it hindering with normal operation?.
If yes, you can try replicating my versions for CUDA, Pytorch, etc that Ive mentioned in the issue description. It works perfectly for me after the mentioned changes.
I have found it a compile problem, and fix the issue "nvcc not found" or "Not compiled with GPU support" or "Detectron2 CUDA Compiler: not available"
according to https://detectron2.readthedocs.io/en/latest/tutorials/install.html#common-installation-issues
Thank you most sincerely.
cfg.SOLVER.WEIGHT_DECAY_BIAS
andcfg.SOLVER.WEIGHT_DECAY
hi could you please tell me what value shoud be set to this decay? Thanks.
WEIGHT_DECAY: 0.05
WEIGHT_DECAY_BIAS: 0.01
Sorry,I'm just a beginner. I set it like this(a random number),and the network could run .But maybe the network didn't match my data set, so it didn't work well.
I hope it helps