gafniguy / 4D-Facial-Avatars

Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Torch Errors

FireElementalNE opened this issue · comments

Hello!

I am trying to get this to work and am getting some weird torch errors. I am newish to ML so was a bit confused.

To get it running I had to make some changes to nerface_code/nerf-pytorch/nerf/train_utils.py hopefully I did not break
something 😅

ray_directions_ablation is used here but when run_one_iter_of_nerf is called here it is not passed. The YML file in
the README has options.dataset.no_ndc as True so it fails. I also commented out some other lines that seemed to
be used for ablation runs:

  • Following the comment here I commented out the paragraph here
  • commented out a line here
  • changed ray_dirs_fake to None here

I am guessing that these were all for ablation studies?

The final error I am getting is this (I included the stdout from the program, and obfuscated the directory structure in the errors):

before signal registration
after registration
starting data loading
Done with data loading
done loading data
loading GT background to condition on
bg shape torch.Size([256, 256, 3])
should be  torch.Size([256, 256, 3])
initialized latent codes with shape 551 X 32
computing boundix boxes probability maps
Starting loop
  0%|          | 0/1000000 [00:00<?, ?it/s]$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1634272092750/work/aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|          | 0/1000000 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "$REPODIR/4D-Facial-Avatars/nerface_code/nerf-pytorch/train_transformed_rays.py", line 593, in <module>
    main()
  File "$REPODIR/4D-Facial-Avatars/nerface_code/nerf-pytorch/train_transformed_rays.py", line 398, in main
    loss_total.backward()
  File "$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2048, 128]], which is output 0 of ReluBackward0, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am hoping it is just a versioning issue, but I am not sure.

This is very cool work and I would love to get it working! I would also echo and ask if there is a pretrained model floating around some where that I (and others) could take a look at!

Thanks!!

I also face this error...

I comment this line, then it worked. However, I am still waiting the training result.

sigma_a[:,-1] += 1e-6 # todo commented this for FCB demo !!!!!!

I also got the same error. When I comment out sigma_a[:,-1] += 1e-6, there are no differences.

Sorry for leaving all the last minute ablation mess there. You were right to remove/comment out/None anything related to ablation.

I also face this error...

I comment this line, then it worked. However, I am still waiting the training result.

sigma_a[:,-1] += 1e-6 # todo commented this for FCB demo !!!!!!

May I ask how long it took you to run this code? I don't know why it shows that I need to run for over 200 hours here