Training Error

Question

Training Error

shilpaullas97 opened this issue 2 months ago · comments

I recently tried out patchfusion model using this repo.
Currently I'm trying to run the training script to train a model by myself.

Following the training steps in [https://github.com/zhyever/PatchFusion/blob/main/docs/user_training.md] , I was able to run coarse and fine model training for depth_anything_vitb model.
But facing the below error while running the training for fusion model.

rank0: File "PatchFusion/estimator/trainer/trainer.py", line 32
6, in run
rank0: self.train_epoch(epoch_idx)
rank0: File "PatchFusion/estimator/trainer/trainer.py", line 25
0, in train_epoch
rank0: self.optimizer_wrapper.update_params(total_loss)
rank0: File "lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py",
line 196, in update_params
rank0: self.backward(loss)
rank0: File "lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py",
line 220, in backward
rank0: loss.backward(**kwargs)
rank0: File "lib/python3.8/site-packages/torch/tensor.py", line 525, in backward
rank0: torch.autograd.backward(
rank0: File "lib/python3.8/site-packages/torch/autograd/init.py", line 260, in backwa
rd
rank0: grad_tensors = make_grads(tensors, grad_tensors, is_grads_batched=False)
rank0: File "lib/python3.8/site-packages/torch/autograd/init.py", line 133, in make
grads
rank0: raise RuntimeError(
rank0: RuntimeError: grad can be implicitly created only for scalar outputs

Could you please give some inputs on this?
Is there anything to be modified on the script?

One more question out of this. Do we have any onnx/tensorrt or any other deployment model version for patchfusion?

zhyever · Answer 1 · Mon Jul 01 2024 21:49:22 GMT+0800 (China Standard Time)

Sorry for the late reply. Just came back from one conference. Would you mind to check the shape of the backward loss? Which version of torch are you using?

I will work for deployment models for patchrefiner, which is our follow-up work of patchfusion.