microsoft / DynamicHead

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Model train execution fails with a PyTorch error message.

shivamsnaik opened this issue · comments

Hello,
I would like to ask if anyone is facing the below issue:
TypeError: add(): argument 'alpha' must be Number, not NoneType

The steps I followed are:

  1. python -m pip install -e DynamicHead.
  2. Added custom dataset using register_coco_instance
  3. Update config in def setup(args) with custom dataset name and the number of classes:
    cfg.DATASETS.TRAIN = ('coco_docbank_train',) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 13
  4. Run the model:
    python train_net.py --config configs/dyhead_r50_rcnn_fpn_1x.yaml --num-gpus 1.

The detailed error message goes like this:

File "train_net.py", line 204, in <module> 
    launch(
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "train_net.py", line 198, in main
    return trainer.train()
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 294, in run_step
    self.optimizer.step()
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/sgd.py", line 136, in step
    F.sgd(params_with_grad,
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/_functional.py", line 164, in sgd
    d_p = d_p.add(param, alpha=weight_decay)
TypeError: add(): argument 'alpha' must be Number, not NoneType

Environment Details:
sys.platform = linux
Python = 3.8.12
numpy = 1.21.2
detectron2 = 0.6
CUDA = 11.4
PyTorch = 1.10.0
torchvision = 0.11.0a0
fvcore = 0.1.5.post20211023
iopath = 0.1.9
cv2 = 4.5.4

Kindly request for assistance if you are aware of the solution.

Hi,I have the same problem as you. Have you solved it?

Hi,I have the same problem as you. Have you solved it?

i have the same problem, have you solved it?

@Houseqin @MajorityRreport Hi. Yes I did solve the issue.

The weight decay is not passed to Pytorch due to which the above error occurs.
Add the following lines to your config YAML file to solve this issue:
cfg.SOLVER.WEIGHT_DECAY_BIAS = <desired_value>

This was causing the issue in my case. Also, if cfg.SOLVER.WEIGHT_DECAY is missing in your config, do include that.

It would surely solve the issue.

@Houseqin @MajorityRreport Hi. Yes I did solve the issue.

The weight decay is not passed to Pytorch due to which the above error occurs. Add the following lines to your config YAML file to solve this issue: cfg.SOLVER.WEIGHT_DECAY_BIAS = <desired_value>

This was causing the issue in my case. Also, if cfg.SOLVER.WEIGHT_DECAY is missing in your config, do include that.

It would surely solve the issue.

Thank you very much!!

@MajorityRreport Glad to help. Let me know if it still throws the same error.

@MajorityRreport Glad to help. Let me know if it still throws the same error.

it also has the same problem with the official config YAML dyhead_swint_atss_fpn_2x_ms.yaml,although with adding cfg.SOLVER.WEIGHT_DECAY_BIAS and cfg.SOLVER.WEIGHT_DECAY

@Houseqin I have never seen this error before. Is it hindering with normal operation?.

If yes, you can try replicating my versions for CUDA, Pytorch, etc that Ive mentioned in the issue description.
It works perfectly for me after the mentioned changes.

@Houseqin I have never seen this error before. Is it hindering with normal operation?.

If yes, you can try replicating my versions for CUDA, Pytorch, etc that Ive mentioned in the issue description. It works perfectly for me after the mentioned changes.

I have found it a compile problem, and fix the issue "nvcc not found" or "Not compiled with GPU support" or "Detectron2 CUDA Compiler: not available" according to https://detectron2.readthedocs.io/en/latest/tutorials/install.html#common-installation-issues

Thank you most sincerely.

cfg.SOLVER.WEIGHT_DECAY_BIAS and cfg.SOLVER.WEIGHT_DECAY

hi could you please tell me what value shoud be set to this decay? Thanks.

WEIGHT_DECAY: 0.05
WEIGHT_DECAY_BIAS: 0.01
Sorry,I'm just a beginner. I set it like this(a random number),and the network could run .But maybe the network didn't match my data set, so it didn't work well.
I hope it helps