Can anyone simply explain about the sliding window and streaming video?
sean-wade opened this issue · comments
I observe that in sliding window mode, say T=8, self.num_frame_head_grads = self.num_frame_losses = 2, the for loop will do forward_pts_train without loss calculate in the first 6 times.
So I think this will cost unnecessary calculation cost?
`
def obtain_history_memory:
...
for i in range(T):
requires_grad = False
return_losses = False
data_t = dict()
for key in data:
data_t[key] = data[key][:, i]
data_t['img_feats'] = data_t['img_feats']
if i >= num_nograd_frames:
requires_grad = True
if i >= num_grad_losses:
return_losses = True
loss = self.forward_pts_train(......)
`
After reading the code, I kinda know the reason:
The first 6 loops is used for the head to accumulate memory, and only the last 2 loops is supervised.
Is my understanding right?
@sean-wade Sorry for late response. Yoy are right, only the last 2 frames are supervised in sliding window training. The temporal modeling is a recurrent manner in StreamPETR. It's important to mitgate the error propagation. So long window size is necessary.