RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Question

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Bingoang opened this issue 3 years ago · comments

Hi, Qian~
I really appreciate your great work!
I want to run the demo, but met a problem:

when I ran the command:
python main.py --config configs/CAPE-affineconv_nz64_pose32_clotype32_male.yaml --mode demo

the terminal's log is:

Pre-computing mesh pooling matrices ..

loading pre-saved transform matrices...
Building model graph...

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

condition_pose_fc1: (126, 63)
condition_pose_fc2: (63, 32)
condition_clo_label_fc1: (4, 32)
condition_pose_fc1: (126, 63)
condition_pose_fc2: (63, 32)
condition_clo_label_fc1: (4, 32)

------------[Generator]------------
------------Encoder------------
encoder_conv1: (6890, 64), K=2
encoder_conv2: (3445, 64), K=2
encoder_conv3: (3445, 128), K=2
encoder_conv4: (1723, 128), K=2
encoder_conv5: (1723, 256), K=2
encoder_conv6: (862, 256), K=2
encoder_conv7: (862, 512), K=2
encoder_conv8: (862, 512), K=2
encoder_1x1conv: (862, 64), K=1
encoder_fc_mean: (55168, 64)
encoder_fc_logvar: (55168, 64)
------------Decoder------------
decoder_fc1: (128, 55168)
decoder_1x1conv: (862, 512), K=1
decoder_resblock_affine1: (862, 256), K=2
decoder_resblock_affine2: (862, 256), K=2
decoder_resblock_affine3: (1723, 128), K=2
decoder_resblock_affine4: (1723, 128), K=2
decoder_resblock_affine5: (3445, 64), K=2
decoder_resblock_affine6: (3445, 64), K=2
decoder_resblock_affine7: (6890, 32), K=2
decoder_resblock_affine8: (6890, 32), K=2
decoder_output: (6890, 3), K=2

----------[Discriminator]----------
conv1: (3445, 64), K=3
conv2: (1723, 64), K=3
conv3: (862, 128), K=3
conv4: (431, 128), K=3
pred_map: (431, 1), K=3

For generative experiments:
condition_pose_fc1: (126, 63)
condition_pose_fc2: (63, 32)
condition_clo_label_fc1: (4, 32)
------------Encoder------------
encoder_conv1: (6890, 64), K=2
encoder_conv2: (3445, 64), K=2
encoder_conv3: (3445, 128), K=2
encoder_conv4: (1723, 128), K=2
encoder_conv5: (1723, 256), K=2
encoder_conv6: (862, 256), K=2
encoder_conv7: (862, 512), K=2
encoder_conv8: (862, 512), K=2
encoder_1x1conv: (862, 64), K=1
encoder_fc_mean: (55168, 64)
encoder_fc_logvar: (55168, 64)
------------Decoder------------
decoder_fc1: (128, 55168)
decoder_1x1conv: (862, 512), K=1
decoder_resblock_affine1: (862, 256), K=2
decoder_resblock_affine2: (862, 256), K=2
decoder_resblock_affine3: (1723, 128), K=2
decoder_resblock_affine4: (1723, 128), K=2
decoder_resblock_affine5: (3445, 64), K=2
decoder_resblock_affine6: (3445, 64), K=2
decoder_resblock_affine7: (6890, 32), K=2
decoder_resblock_affine8: (6890, 32), K=2
decoder_output: (6890, 3), K=2

2021-03-17 12:20:43.953960: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-03-17 12:20:44.184974: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x558d71c02640 executing computations on platform CUDA. Devices:
2021-03-17 12:20:44.185006: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): TITAN X (Pascal), Compute Capability 6.1
2021-03-17 12:20:44.185014: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): TITAN X (Pascal), Compute Capability 6.1
2021-03-17 12:20:44.203509: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3598130000 Hz
2021-03-17 12:20:44.204166: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x558d71c74fd0 executing computations on platform Host. Devices:
2021-03-17 12:20:44.204198: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2021-03-17 12:20:44.204352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 11.12GiB
2021-03-17 12:20:44.204409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties:
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:03:00.0
totalMemory: 11.91GiB freeMemory: 11.77GiB
2021-03-17 12:20:44.206252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2021-03-17 12:20:44.209304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-17 12:20:44.209335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1
2021-03-17 12:20:44.209349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y
2021-03-17 12:20:44.209360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N
2021-03-17 12:20:44.209479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10813 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-03-17 12:20:44.209972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11446 MB memory) -> physical GPU (device: 1, name: TITAN X (Pascal), pci bus id: 0000:03:00.0, compute capability: 6.1)
2021-03-17 12:20:45.017010: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally

=============== Running demo: fix z, clotype, change pose ===============

Found 6 different pose, for each we generate 5 samples

2021-03-17 12:20:45.231372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2021-03-17 12:20:45.231460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-17 12:20:45.231470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1
2021-03-17 12:20:45.231477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y
2021-03-17 12:20:45.231483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N
2021-03-17 12:20:45.231553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10813 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-03-17 12:20:45.231726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11446 MB memory) -> physical GPU (device: 1, name: TITAN X (Pascal), pci bus id: 0000:03:00.0, compute capability: 6.1)
saving results as .obj files to /home/ang/CAPE-master/results/CAPE-affineconv_nz64_pose32_clotype32_male/sample_vary_pose...
Traceback (most recent call last):
File "main.py", line 109, in
demos.run()
File "/home/ang/CAPE-master/demos.py", line 335, in run
self.sample_vary_pose()
File "/home/ang/CAPE-master/demos.py", line 164, in sample_vary_pose
save_obj=self.save_obj, obj_dir=obj_dir)
File "/home/ang/CAPE-master/demos.py", line 324, in pose_result_onepose_multisample
self.smpl_model.body_pose[:] = torch.from_numpy(pose_params[i][3:])
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

I've searched ways on google but still cannot solve it.
So I hope you can give me some advice. Thanks a lot!

Best wishes

Qianli Ma · Answer 1 · Sat Mar 20 2021 01:32:26 GMT+0800 (China Standard Time)

Hi, I can't reproduce this on my environment. Which pytorch version are you using? Could you try torch==1.2?

Bingo_ang · Answer 2 · Sat Mar 20 2021 11:19:21 GMT+0800 (China Standard Time)

@qianlim
Fine, thanks very much!
I create a new virtual environment and conda install pytorch==1.2.0 at first and then install other dependencies. It finally goes well~

My Suggestion: The "requirements.txt" already includes "smplx==0.1.13", so I don't follow the installtion step "Install smplx python package", or it will install the newest version pytorch=1.8，and cause the problem "RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation."

Here’s my installation steps(some steps' orders are different from the author's), hope it can help someone else who met the same problem:

conda create -n cape python=3.6
conda activate cape
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0
pip install -U pip setuptools
pip install -r requirements.txt
Download "psbody_mesh-0.3-cp36-cp36m-linux_x86_64.whl"(https://github.com/MPI-IS/mesh/releases/tag/v0.3)
pip install psbody_mesh-0.3-cp36-cp36m-linux_x86_64.whl
Download the SMPL body model, and place the .pkl files for both genders and put them in /body_models/smpl/. Follow the author's instructions to remove the Chumpy objects from both model pkls. Change the names to "SMPL_MALE.pkl"
and "SMPL_FEMALE.pkl"
python main.py --config configs/CAPE-affineconv_nz64_pose32_clotype32_male.yaml --mode demo

Qianli Ma · Answer 3 · Sun Mar 21 2021 01:30:46 GMT+0800 (China Standard Time)

Thank you very much! I'll update the README in a few days.

Bingo_ang · Answer 4 · Fri Mar 26 2021 22:32:24 GMT+0800 (China Standard Time)

@qianlim Fine! By the way, when would you release sample codes for fitting to images / 3D data? We fans are looking forward to your release very much~ :-P

Metareflektor · Answer 5 · Fri Apr 02 2021 14:48:09 GMT+0800 (China Standard Time)

python main.py --config configs/CAPE-affineconv_nz64_pose32_clotype32_male.yaml --mode demo

Thank you for your update regarding the installation. I followed the steps but i got a error while executing the demo command

Traceback (most recent call last):
  File "main.py", line 109, in <module>
    demos.run()
  File "C:\Users\workspace\master\models\cape\CAPE-master\demos.py", line 335, in run
    self.sample_vary_pose()
  File "C:\Users\workspace\master\models\cape\CAPE-master\demos.py", line 164, in sample_vary_pose
    save_obj=self.save_obj, obj_dir=obj_dir)
  File "C:\Users\workspace\master\models\cape\CAPE-master\demos.py", line 326, in pose_result_onepose_multisample
    verts_out = self.smpl_model().vertices.detach().cpu().numpy()
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\smplx\body_models.py", line 376, in forward
    self.lbs_weights, pose2rot=pose2rot, dtype=self.dtype)
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\smplx\lbs.py", line 179, in lbs
    v_shaped = v_template + blend_shapes(betas, shapedirs)
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\smplx\lbs.py", line 265, in blend_shapes
    blend_shape = torch.einsum('bl,mkl->bmk', [betas, shape_disps])
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\torch\functional.py", line 202, in einsum
    return torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: size of dimension does not match previous size, operand 1, dim 2

Did you have a similiar error and how did you fix it?

Qianli Ma · Answer 6 · Fri Apr 02 2021 19:17:52 GMT+0800 (China Standard Time)

python main.py --config configs/CAPE-affineconv_nz64_pose32_clotype32_male.yaml --mode demo

Thank you for your update regarding the installation. I followed the steps but i got a error while executing the demo command

Traceback (most recent call last):
  File "main.py", line 109, in <module>
    demos.run()
  File "C:\Users\workspace\master\models\cape\CAPE-master\demos.py", line 335, in run
    self.sample_vary_pose()
  File "C:\Users\workspace\master\models\cape\CAPE-master\demos.py", line 164, in sample_vary_pose
    save_obj=self.save_obj, obj_dir=obj_dir)
  File "C:\Users\workspace\master\models\cape\CAPE-master\demos.py", line 326, in pose_result_onepose_multisample
    verts_out = self.smpl_model().vertices.detach().cpu().numpy()
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\smplx\body_models.py", line 376, in forward
    self.lbs_weights, pose2rot=pose2rot, dtype=self.dtype)
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\smplx\lbs.py", line 179, in lbs
    v_shaped = v_template + blend_shapes(betas, shapedirs)
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\smplx\lbs.py", line 265, in blend_shapes
    blend_shape = torch.einsum('bl,mkl->bmk', [betas, shape_disps])
  File "C:\Users\miniconda3\envs\cape\lib\site-packages\torch\functional.py", line 202, in einsum
    return torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: size of dimension does not match previous size, operand 1, dim 2

Did you have a similiar error and how did you fix it?

Hi, which version of the smplx package do you have? You could try smplx==0.1.13, that's the version on which the code is tested.

Metareflektor · Answer 7 · Fri Apr 02 2021 19:25:05 GMT+0800 (China Standard Time)

Hi, which version of the smplx package do you have? You could try smplx==0.1.13, that's the version on which the code is tested.

I'am using the recommended package versions (smplx==0.1.13, torch==1.2.0, tensorflow-gpu=1.13.2)

Metareflektor · Answer 8 · Fri Apr 02 2021 20:25:20 GMT+0800 (China Standard Time)

Sorry, it was another problem. The problem was that i got 300 betas within shapedirs instead of 10 in the blend_shapes function of smplx so i mixed up the pkl-files of the SMPL model before. Everything is working now. Great work!

Qianli Ma · Answer 9 · Fri Apr 02 2021 21:26:25 GMT+0800 (China Standard Time)

Thanks for finding out! I'll update the README.