Storife / RANet

RANet: Ranking Attention Network for Fast Video Object Segmentation (VOS), ICCV2019

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question About Test Time?

CocoRLin opened this issue · comments

Hi!
I run this code on 1 GPU(Nvidia 1080ti). And I think your code every time process 1 to 4 frames, ringt?But when I compute the forward pass(only time between "output=model(*input)"), it shows about 230ms when batch =4. It equals about 60ms a frame. It's different from your paper... Am I wrong?

Here is my code(I add time code in RANet_lib.py):

  ` inputs = [torch.cat(Img)[index_select], torch.cat(KFea)[index_select], torch.cat(KMsk)[index_select], torch.cat(PMsk)[index_select]]

    torch.cuda.synchronize()

    stime = time.time()

    outputs, _ = model(*inputs)

    torch.cuda.synchronize()

    etime = time.time()

    model_time = (etime-stime)*1000

    print('time:',model_time)`

Hi,
Thanks for your question.

Your code has no problem. Towards this problem, we test the speed again using the default setting on two kinds of GPUs, 1080Ti and 2080Ti. We use pytorch=1.0.1, cuda=10.0.
Here are the latest testing results (in seconds).

Batchsize 1 2 4
1080Ti 0.045 0.04 0.038
2080Ti 0.037 0.025 0.023

However, it seems that the speed can not reach the speed in the original paper while using the batchsize=1. The reason is that I did not use the code "torch.cuda.synchronize()" while doing the speed test for this paper. So that the test result of the model would be faster than it actually runs. I'm terribly sorry about this problem, and we will correct it soon.

commented

@Storife
Hi, what's the effect of "torch.cuda.synchronize()"? Do you mean disabling it will make the network inference time faster?

@Guptajakala
No, but the evaluated speed would be faster if your CPU do not wait for your GPU.