Train error with custom scene

Question

Train error with custom scene

Diodalos opened this issue 4 months ago · comments

Hi, I was trying to train TRIPS with KITTI-datset and got error message when starting ./build/bin/train

I had done all processes before training, dense reconstruction with colmap, colmap2adop format, and augment pointclouds for adop.

Error message comes from where the size of tensors are mismatched.

register neural render info
register TnnInfo
Git ref: 7e1fc6bf85f817af889a45bcf4ecef513c9c887c
PyTorch version: 1.13.1
The cuDNN version is 8302
cuDNN avail? 1
The CUDA runtime version is 11080
The driver version is 12020
Loading Config File configs/train_normalnet.ini
Using Random Seed: 3746934646
POINT GRADIENTS ARE COMPUTED.
torch::cuda::cudnn_is_available() 1
Render Mode epochs: DT -2 - Fullblend -2 - Fuzzyblend -2 - BilinearBlend -2 - FastBlend 0
Use NeAT reco====================================
Scene Loaded
  Name       kitti_0005
  Path       /home/rvl-urop/UROP/UROP_ksj2/TRIPS/scenes/kitti_0005
  Image Size 1242x375
  Aspect     3.312
  K          704.492 704.492 621 187.5 0 
  ocam       1242x375 affine(1, 0, 0, 0, 0) cam2world() world2cam()
  ocam cut   1
  normalized center 0 0 
  dist       0.00278164 0 0 0 0 0 0 0 
CAM model: CameraModel::PINHOLE_DISTORTION
  Points     804159
  Colors     1
  Normals    1
  Avg. EV  0
  Num Images 154
  Num Cameras 1
Compute scene importance bounding box as 95% of points interval around center of mass
Starting Compute center of mass...center of mass:2.56336 
0.115132 
3.24829 
 Done in 4.16618ms.
Starting Build range vec... Done in 1.10442ms.
Starting Sort range vec... Done in 28.6245ms.
Starting Extend box... Done in 2.93637ms.
Box: AABB: [-5.93634 -2.2858 -4.82504 ] [10.3077 1.23797 11.3726 ]
====================================
Modulo stepsize: 8
Train(134): 1 2 3 4 5 6 7 9 10 11 12 13 14 15 17 18 19 20 21 22 23 25 26 27 28 29 30 31 33 34 35 36 37 38 39 41 42 43 44 45 46 47 49 50 51 52 53 54 55 57 58 59 60 61 62 63 65 66 67 68 69 70 71 73 74 75 76 77 78 79 81 82 83 84 85 86 87 89 90 91 92 93 94 95 97 98 99 100 101 102 103 105 106 107 108 109 110 111 113 114 115 116 117 118 119 121 122 123 124 125 126 127 129 130 131 132 133 134 135 137 138 139 140 141 142 143 145 146 147 148 149 150 151 153
Test(20): 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152
cropped sampler 1242x375 to 512 x 512 render scale 1
cropped sampler 1242x375 to 512 x 512 render scale 1
full sampler 1242x375 render scale 1
full sampler 1242x375 render scale 1
POINT GRADIENTS ARE COMPUTED.
GPU memory - Point Cloud: 28.9497MB
Pinhole Intrinsics: 704.492 704.492 621 187.5 0  -- 0.00278164 0 0 0 0 0 0 0 
Tensor [1, 13] float cuda:0 Min/Max 0 704.492 Mean 170.576 Sum 2217.49 sdev 293.707 req-grad 0 Min-Coords: [0, 4, ]
GPU memory - Texture [4, 804159] : 19.2998MB
Using Mm Adam texture optimizer
optimizing texture with lr 0.1/0.004
optimizing response with lr 0.0001
optimizing exposure with lr 0.0005
Optimizing with my adam implementation.
optimizing 3D points with lr 0.0001
optimizing point size with lr 0.01
optimizing poses with lr 0.0001
no intrinsics optimizer
POINT GRADIENTS ARE COMPUTED.
Using MultiScaleUnet2dDecOnlySmall with filters: 
32 32 32 32 32 32 32 32 
RENDER NETWORK: NUM PARAMETERS: 22
Total Model Params: 59675
Loading VGG from: loss/traced_caffe_vgg_optim.pt

=== Epoch 0 ===
Scene Log - Texture: Tensor [4, 804159] float cpu Min/Max 4.76837e-07 1 Mean 0.500163 Sum 1.60884e+06 sdev 0.288602 req-grad 1 Min-Coords: [1, 358862, ]
  Background Desc:  
  Confidence per point: Tensor [1, 804159] float cpu Min/Max 0.5 0.5 Mean 0.5 Sum 402080 sdev 0 req-grad 0 Min-Coords: [0, 0, ]
  Confidences under 0.5: 804159
  LayerBuf per point: Tensor [804159, 1] float cuda:0 Min/Max -7.24415 1.07804 Mean -4.95689 Sum -3.98613e+06 sdev 0.684944 req-grad 1 Min-Coords: [484212, 0, ]
      softplus: Tensor [804159, 1] float cuda:0 Min/Max 0.000714085 1.3709 Mean 0.00956212 Sum 7689.47 sdev 0.014579 req-grad 1 Min-Coords: [484212, 0, ]
  Poses: Tensor [154, 8] double cuda:0 Min/Max -7.39978 5.23697 Mean 0.175503 Sum 216.219 sdev 1.37954 req-grad 0 Min-Coords: [153, 6, ]
  Point Position: Tensor [804159, 4] float cuda:0 Min/Max -7.2832 77.185 Mean 1.49127 Sum 4.79687e+06 sdev 3.41098 req-grad 1 Min-Coords: [561302, 0, ]
Eval  0 |   0% |                              |   0/134 [00:00:0000] [0.00 e/s] USING NEW IMPL
terminate called after throwing an instance of 'c10::Error'
  what():  The size of tensor a (2) must match the size of tensor b (1242) at non-singleton dimension 3
Exception raised from infer_size_impl at ../aten/src/ATen/ExpandUtils.cpp:35 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f3a847693cb in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7f3a84764d9e in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libc10.so)
frame #2: at::infer_size_dimvector(c10::ArrayRef<long>, c10::ArrayRef<long>) + 0x48b (0x7f3a85e895eb in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #3: at::TensorIteratorBase::compute_shape(at::TensorIteratorConfig const&) + 0x10d (0x7f3a85ef34ad in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #4: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x69 (0x7f3a85ef4769 in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #5: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf2 (0x7f3a85ef5f82 in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x2b9e208 (0x7f3a5b1c3208 in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cuda_cu.so)
frame #7: <unknown function> + 0x2b9e2f3 (0x7f3a5b1c32f3 in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cuda_cu.so)
frame #8: at::_ops::mul_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) + 0x90 (0x7f3a869a1b60 in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x37f7939 (0x7f3a88346939 in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x37f8246 (0x7f3a88347246 in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #11: at::_ops::mul_Tensor::call(at::Tensor const&, at::Tensor const&) + 0xdb (0x7f3a869fda0b in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/External/libtorch/lib/libtorch_cpu.so)
frame #12: NeuralPipeline::Forward(NeuralScene&, std::vector<std::shared_ptr<TorchFrameData>, std::allocator<std::shared_ptr<TorchFrameData> > >&, at::Tensor, bool, int, bool, float, Eigen::Matrix<float, 3, 1, 0>) + 0x313c (0x7f3ab5ba71dc in /home/rvl-urop/UROP/UROP_ksj2/TRIPS/build/bin/libNeuralPoints.so)
frame #13: <unknown function> + 0xd743c (0x5560810d143c in ./build/bin/train)
frame #14: <unknown function> + 0xe16eb (0x5560810db6eb in ./build/bin/train)
frame #15: <unknown function> + 0x2dac0 (0x556081027ac0 in ./build/bin/train)
frame #16: __libc_start_main + 0xf3 (0x7f3a58057083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #17: <unknown function> + 0x2ef9e (0x556081028f9e in ./build/bin/train)

Aborted (core dumped)