exiawsh / StreamPETR

[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can anyone simply explain about the sliding window and streaming video?

sean-wade opened this issue · comments

commented

I observe that in sliding window mode, say T=8, self.num_frame_head_grads = self.num_frame_losses = 2, the for loop will do forward_pts_train without loss calculate in the first 6 times.

So I think this will cost unnecessary calculation cost?

`

def obtain_history_memory:
    ...
    for i in range(T):
        requires_grad = False
        return_losses = False
        data_t = dict()
        for key in data:
            data_t[key] = data[key][:, i] 

        data_t['img_feats'] = data_t['img_feats']
        if i >= num_nograd_frames:
            requires_grad = True
        if i >= num_grad_losses:
            return_losses = True
        loss = self.forward_pts_train(......)

`

commented

After reading the code, I kinda know the reason:
The first 6 loops is used for the head to accumulate memory, and only the last 2 loops is supervised.
Is my understanding right?

@sean-wade Sorry for late response. Yoy are right, only the last 2 frames are supervised in sliding window training. The temporal modeling is a recurrent manner in StreamPETR. It's important to mitgate the error propagation. So long window size is necessary.