alievk / npbg

Neural Point-Based Graphics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scene.program is None

chekirou opened this issue · comments

Hi,
I am trying to fit a scene and I have a problem with the dataloader. At the second epoch, even though the dataset is loaded, the scene.program seems to be None. Do you have an idea on where could the problem ?

Here is error message :

EPOCH 1

TRAIN
EVAL MODE IN TRAIN
model parameters: 1928771
running on datasets [0]
proj_matrix was not set
total parameters: 76715531
Traceback (most recent call last):
File "train.py", line 517, in
train_loss = run_train(epoch, pipeline, args, iter_cb)
File "train.py", line 253, in run_train
return run_epoch(pipeline, 'train', epoch, args, iter_cb=iter_cb)
File "train.py", line 228, in run_epoch
run_sub(dl, extra_optimizer)
File "train.py", line 118, in run_sub
for it, data in enumerate(dl):
File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data\dataloader.py", line 517, in next
data = self._next_data()
File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data\dataloader.py", line 557, in _next_data
data = self.dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\user.conda\envs\npbg\lib\site-packages\torch\utils\data\dataset.py", line 219, in getitem
return self.datasets[dataset_idx][sample_idx]
File "C:\Users\user\Documents\npbg\npbg\datasets\dynamic.py", line 246, in getitem
input
= self.renderer.render(view_matrix=view_matrix, proj_matrix=proj_matrix)
File "C:\Users\user\Documents\npbg\npbg\datasets\dynamic.py", line 68, in render
self.scene.set_camera_view(view_matrix)
File "C:\Users\user\Documents\npbg\npbg\gl\programs.py", line 366, in set_camera_view
self.program['m_view'] = inv(m).T

Solved by deleting the renderer in the unload function.

Hi.

I'm facing the same issue. Could you please elaborate on what exactly you did to solve the issue? Which unload function are you talking about?

Thanks in advance

Hi,

@alievk @seva100 do you have any pointers to solve this issue? I deleted the renderer and set it as None in the unload function (line 182 in /npbg/datasets/dynamic.py). While this solves the problem and I am able to train beyond the 1st epoch, the cpu ram usage and the gpu memory usage increases continuously and hence, I'm not able to train beyond 20 epochs at a time.

Thanks in advance

Hi @Shubhendu-Jena, unfortunately, I have never stumbled upon this issue. I can only suggest saving checkpoints regularly and restarting training from the latest checkpoint after it fails because of memory overflow...