zju3dv / Vox-Fusion

Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDA out of memory.

Xiaxia1997 opened this issue · comments

I am trying to run scannet/scene0059, but got cuda out of memory error. Here is the error message:

home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Vox-Fusion/src/tracking.py", line 97, in spin
    self.do_tracking(share_data, current_frame, kf_buffer)
  File "/Vox-Fusion/src/tracking.py", line 128, in do_tracking
    frame_pose, hit_mask = track_frame(
  File "/Vox-Fusion/src/variations/render_helpers.py", line 450, in track_frame
    final_outputs = render_rays(
  File "/Vox-Fusion/src/variations/render_helpers.py", line 223, in render_rays
    samples = ray_sample(intersections, step_size=step_size)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Vox-Fusion/src/variations/voxel_helpers.py", line 575, in ray_sample
    sampled_idx, sampled_depth, sampled_dists = inverse_cdf_sampling(
  File "/Vox-Fusion/src/variations/voxel_helpers.py", line 292, in forward
    noise = min_depth.new_zeros(*min_depth.size()[:-1], max_steps)
RuntimeError: CUDA out of memory. Tried to allocate 745.06 GiB (GPU 0; 23.70 GiB total capacity; 146.05 MiB already allocated; 11.93 GiB free; 176.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
^CTraceback (most recent call last):
  File "demo/run.py", line 23, in <module>
    slam.wait_child_processes()
  File "/Vox-Fusion/src/voxslam.py", line 62, in wait_child_processes
    p.join()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Process Process-2:
Traceback (most recent call last):
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Vox-Fusion/src/mapping.py", line 89, in spin
    if not kf_buffer.empty():
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/queues.py", line 123, in empty
    return not self._poll()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 925, in wait
    selector.register(obj, selectors.EVENT_READ)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/selectors.py", line 352, in register
    key = super().register(fileobj, events, data)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/selectors.py", line 235, in register
    if (not events) or (events & ~(EVENT_READ | EVENT_WRITE)):
KeyboardInterrupt
/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I got the same problem! Hope for the reply.

This problem might need to be diagnosed with more intermediate results. How are the predicted color and depth maps? (generated with the render_freq option)

Now I dont' have the predicted color and depth maps, hope @Xiaxia1997 can provide more information.

I can provide my found: I just print *min_depth.size()[:-1], max_steps and I found max_steps is huge, like 8*1e8. I check the source code, it might be the problem of max_distance and min_distance here.

I encounter this error each time there is a loop during the tracking. It seems the ray intersects with very far voxels, causing the max distance to be very big.

I wonder how this problem can be solved. I find the max_depth in the config, maybe the voxel that exceeds the max_depth value should be ignored?