FPS

Question

FPS

gbliao opened this issue 2 years ago · comments

Thanks for your great work! We are very interested in the FPS mentioned in the paper. However, after running the code in speed_test.py, we have the following confusion.

(1) The paper mentions using a single 2080Ti GPU to test and get a 450 fps speed, but actually running the self-defined tensor with a batch of 20 in speed_test.py on a single 2080Ti will out of memory. So, how can we get the 450fps on a single 2080Ti GPU?

(2) We found that a self-defined tensor is used for inference speed testing in speed_test.py, which is different from the process in test.py that uses an actual image. The comparison is perhaps unfair. So we are very curious what the exact FPS result would be in test.py with a batch of 1. And what about including the ‘’torch.cuda.synchronize()‘’?

gbliao · Answer 1 · Sat Apr 30 2022 11:26:04 GMT+0800 (China Standard Time)

Moreover, can you provide the FLOPs of the MobileSal model? We think this result makes the comparison more comprehensive. Thank you very much!

Yu-Huan Wu · Answer 2 · Sat Apr 30 2022 11:29:18 GMT+0800 (China Standard Time)

Thank you for your interest to our paper informationabout the issues you met. Let's solve the No.1 issue first. I never met this issue before and even a resnet-50-based method can support a inference batch size of 20. Could you please provide more details about the running environment? Then we may know what causes this issue. Perhaps we also have a similar environment to re-implement this issue. Best, Yu-Huan

…

在 2022年4月30日，11:09，gbliao ***@***.***> 写道： Thanks for your great work! We are very interested in the FPS mentioned in the paper. However, after running the code in speed_test.py, we have the following confusion. (1) The paper mentions using a single 2080Ti GPU to test and get a 450 fps speed, but actually running the self-defined tensor with a batch of 20 in speed_test.py on a single 2080Ti will out of memory. So, how can we get the 450fps on a single 2080Ti GPU? (2) We found that a self-defined tensor is used for inference speed testing in speed_test.py, which is different from the process in test.py that uses an actual image. The comparison is perhaps unfair. So we are very curious what the exact FPS result would be in test.py with a batch of 1. And what about including the ‘’torch.cuda.synchronize()‘’? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

Yu-Huan Wu · Answer 3 · Sat Apr 30 2022 11:32:33 GMT+0800 (China Standard Time)

I have tested the GFLOPS of MobileSal a long time ago. Given the input of size 224×224, it is about 0.4GFlops. Since I am currently on a vacation, I will test mobileSal again with the precise number of GFLOPS in the next week. Thank you for your patience!

…

在 2022年4月30日，11:26，gbliao ***@***.***> 写道： Moreover, can you provide the FLOPs of the MobileSal model? Thank you very much! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

gbliao · Answer 4 · Sat Apr 30 2022 11:41:35 GMT+0800 (China Standard Time)

Thanks for your timely reply! Our running environment:

Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Linux version 4.15.0-45-generic (buildd@lcy01-amd64-027) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~~16.04.10)) #48~~16.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019

python 3.6.7
torch 1.5.1+cu101
torch-nightly 1.0.0.dev20181123
torchstat 0.0.7
torchvision 0.6.1+cu101

gbliao · Answer 5 · Sat Apr 30 2022 11:44:49 GMT+0800 (China Standard Time)

I have tested the GFLOPS of MobileSal a long time ago. Given the input of size 224×224, it is about 0.4GFlops. Since I am currently on a vacation, I will test mobileSal again with the precise number of GFLOPS in the next week. Thank you for your patience!
…
在 2022年4月30日，11:26，gbliao @.***> 写道： Moreover, can you provide the FLOPs of the MobileSal model? Thank you very much! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

Thanks! We are curious about the GFLOPs with the input of 320x320 as mentioned in the paper. And we can discuss the above issues together after the holidays. Have a nice holiday!

Yu-Huan Wu · Answer 6 · Sat May 07 2022 15:30:40 GMT+0800 (China Standard Time)

The number of GFLOPS & FPS for different input sizes is as below:

Input Size	GFLOPS	PyTorch FPS (bs=20, fp32, RTX 2080Ti)
224 x 224	0.76G	900
320 x 320	1.56G	450

I am also going to replement the speed issues given the specific enviroment (python=3.6, torch=1.5.1+cu101, torchvision=0.6.1+cu101). Please wait in patience :)

gbliao · Answer 7 · Sat May 07 2022 18:41:06 GMT+0800 (China Standard Time)

Thanks for your reply. We have tried testing FLOPs with the input of 320 x 320 by using thop.profile and the results are as follows.

[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU6'>.
[WARN] Cannot find rule for <class 'MobileNetV2.ConvBNReLU'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torch.nn.modules.container.Sequential'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'MobileNetV2.InvertedResidual'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'MobileNetV2.MobileNetV2'>. Treat it as zero Macs and zero Params.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU'>.
[WARN] Cannot find rule for <class '__main__.ConvBNReLU'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.InvertedResidual'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.DepthNet'>. Treat it as zero Macs and zero Params.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[WARN] Cannot find rule for <class '__main__.DepthFuseNet'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torch.nn.modules.container.ModuleList'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.IDR'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torch.nn.modules.activation.Sigmoid'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.CPR'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.Fusion'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.CPRDecoder'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.MobileSal'>. Treat it as zero Macs and zero Params.
FLOPs: 1.988173288 G

gbliao · Answer 8 · Sat May 07 2022 18:53:13 GMT+0800 (China Standard Time)

Moreover, we are very curious what the exact FPS result would be in test.py with a batch of 1. We set the code in speed_test.py as follows, is this the right setting？

x = torch.randn(1,3,320,320).cuda()
y = torch.randn(1,1,320,320).cuda()

######################################
#### PyTorch Test [BatchSize 1] #####
######################################
for i in tqdm(range(50)):
    # warm up
    p = model(x,y)

total_t = 0
for i in tqdm(range(100)):
    start = time()
    p = model(x,y)
    total_t += time() - start

print("FPS", 100 / total_t * 1)

However, we only achieve 54~60 FPS under this setting.

Yu-Huan Wu · Answer 9 · Sat May 07 2022 19:20:22 GMT+0800 (China Standard Time)

If you use PyTorch backend with batchsize 20, your gpu will have a high GPU utility rate (nvidia-smi), while the GPU utility rate (nvidia-smi) becomes ~20% if you use PyTorch backend with batchsize 1. This behavior does not appear in methods with regular backbones.
So, you need to increase the batchsize to let the GPU utility rate raise to a favorable rate. A test with pytorch backend and batchsize 20 can achieve 450fps. If you use TensorRT backend, you can get a comparable speed with batchsize 1.

On the other hand, the log you showed me seems to reveal that your code is computing MAdd, which is about twice the FLOPS. Meanwhile, the computational cost of some ops seem not to be computed (see the warnings in the log). So you are not computing the real number of FLOPS. I recommend to use torchsummary to compute the GFLOPS. torchsummary can give specfic number of the computational cost for each layer. For convenience, I show the detailed computational cost (Madd) for each layer using torchsummary as below:

loading imagenet pretrained mobilenetv2
loaded imagenet pretrained mobilenetv2
Using GPU:  0
Input size:  [3, 320, 320]
Using COMPLEX summary mode:
                                 module name   input shape  output shape  parameter quantity inference memory(MB)        MAdd duration percent
0                        backbone.features.0     3 320 320    32 160 160                 928               9.38MB  47,513,600            3.97%
1                 backbone.features.1.conv.0    32 160 160    32 160 160                 352               9.38MB  18,022,400            0.11%
2          backbone.features.1.conv.1_Conv2d    32 160 160    16 160 160                 512               1.56MB  25,804,800            2.16%
3     backbone.features.1.conv.2_BatchNorm2d    16 160 160    16 160 160                  32               1.56MB   1,638,400            0.05%
4                 backbone.features.2.conv.0    16 160 160    96 160 160                1728              28.12MB  88,473,600            4.36%
5                 backbone.features.2.conv.1    96 160 160    96  80  80                1056               7.03MB  13,516,800            0.10%
6          backbone.features.2.conv.2_Conv2d    96  80  80    24  80  80                2304               0.59MB  29,337,600            2.31%
7     backbone.features.2.conv.3_BatchNorm2d    24  80  80    24  80  80                  48               0.59MB     614,400            0.07%
8                 backbone.features.3.conv.0    24  80  80   144  80  80                3744              10.55MB  47,923,200            3.07%
9                 backbone.features.3.conv.1   144  80  80   144  80  80                1584              10.55MB  20,275,200            0.09%
10         backbone.features.3.conv.2_Conv2d   144  80  80    24  80  80                3456               0.59MB  44,083,200            1.81%
11    backbone.features.3.conv.3_BatchNorm2d    24  80  80    24  80  80                  48               0.59MB     614,400            0.06%
12                backbone.features.4.conv.0    24  80  80   144  80  80                3744              10.55MB  47,923,200            0.11%
13                backbone.features.4.conv.1   144  80  80   144  40  40                1584               2.64MB   5,068,800            0.32%
14         backbone.features.4.conv.2_Conv2d   144  40  40    32  40  40                4608               0.20MB  14,694,400            1.25%
15    backbone.features.4.conv.3_BatchNorm2d    32  40  40    32  40  40                  64               0.20MB     204,800            0.04%
16                backbone.features.5.conv.0    32  40  40   192  40  40                6528               3.52MB  20,889,600            0.89%
17                backbone.features.5.conv.1   192  40  40   192  40  40                2112               3.52MB   6,758,400            0.07%
18         backbone.features.5.conv.2_Conv2d   192  40  40    32  40  40                6144               0.20MB  19,609,600            0.75%
19    backbone.features.5.conv.3_BatchNorm2d    32  40  40    32  40  40                  64               0.20MB     204,800            0.08%
20                backbone.features.6.conv.0    32  40  40   192  40  40                6528               3.52MB  20,889,600            0.14%
21                backbone.features.6.conv.1   192  40  40   192  40  40                2112               3.52MB   6,758,400            0.11%
22         backbone.features.6.conv.2_Conv2d   192  40  40    32  40  40                6144               0.20MB  19,609,600            0.05%
23    backbone.features.6.conv.3_BatchNorm2d    32  40  40    32  40  40                  64               0.20MB     204,800            0.04%
24                backbone.features.7.conv.0    32  40  40   192  40  40                6528               3.52MB  20,889,600            0.13%
25                backbone.features.7.conv.1   192  40  40   192  20  20                2112               0.88MB   1,689,600            0.10%
26         backbone.features.7.conv.2_Conv2d   192  20  20    64  20  20               12288               0.10MB   9,804,800            0.48%
27    backbone.features.7.conv.3_BatchNorm2d    64  20  20    64  20  20                 128               0.10MB     102,400            0.04%
28                backbone.features.8.conv.0    64  20  20   384  20  20               25344               1.76MB  20,275,200            0.52%
29                backbone.features.8.conv.1   384  20  20   384  20  20                4224               1.76MB   3,379,200            0.47%
30         backbone.features.8.conv.2_Conv2d   384  20  20    64  20  20               24576               0.10MB  19,635,200            0.57%
31    backbone.features.8.conv.3_BatchNorm2d    64  20  20    64  20  20                 128               0.10MB     102,400            0.04%
32                backbone.features.9.conv.0    64  20  20   384  20  20               25344               1.76MB  20,275,200            0.30%
33                backbone.features.9.conv.1   384  20  20   384  20  20                4224               1.76MB   3,379,200            0.54%
34         backbone.features.9.conv.2_Conv2d   384  20  20    64  20  20               24576               0.10MB  19,635,200            0.04%
35    backbone.features.9.conv.3_BatchNorm2d    64  20  20    64  20  20                 128               0.10MB     102,400            0.03%
36               backbone.features.10.conv.0    64  20  20   384  20  20               25344               1.76MB  20,275,200            0.08%
37               backbone.features.10.conv.1   384  20  20   384  20  20                4224               1.76MB   3,379,200            0.83%
38        backbone.features.10.conv.2_Conv2d   384  20  20    64  20  20               24576               0.10MB  19,635,200            0.03%
39   backbone.features.10.conv.3_BatchNorm2d    64  20  20    64  20  20                 128               0.10MB     102,400            0.03%
40               backbone.features.11.conv.0    64  20  20   384  20  20               25344               1.76MB  20,275,200            0.24%
41               backbone.features.11.conv.1   384  20  20   384  20  20                4224               1.76MB   3,379,200            0.06%
42        backbone.features.11.conv.2_Conv2d   384  20  20    96  20  20               36864               0.15MB  29,452,800            0.82%
43   backbone.features.11.conv.3_BatchNorm2d    96  20  20    96  20  20                 192               0.15MB     153,600            0.03%
44               backbone.features.12.conv.0    96  20  20   576  20  20               56448               2.64MB  45,158,400            1.33%
45               backbone.features.12.conv.1   576  20  20   576  20  20                6336               2.64MB   5,068,800            0.25%
46        backbone.features.12.conv.2_Conv2d   576  20  20    96  20  20               55296               0.15MB  44,198,400            2.44%
47   backbone.features.12.conv.3_BatchNorm2d    96  20  20    96  20  20                 192               0.15MB     153,600            0.03%
48               backbone.features.13.conv.0    96  20  20   576  20  20               56448               2.64MB  45,158,400            0.39%
49               backbone.features.13.conv.1   576  20  20   576  20  20                6336               2.64MB   5,068,800            0.26%
50        backbone.features.13.conv.2_Conv2d   576  20  20    96  20  20               55296               0.15MB  44,198,400            0.03%
51   backbone.features.13.conv.3_BatchNorm2d    96  20  20    96  20  20                 192               0.15MB     153,600            0.03%
52               backbone.features.14.conv.0    96  20  20   576  20  20               56448               2.64MB  45,158,400            0.54%
53               backbone.features.14.conv.1   576  20  20   576  10  10                6336               0.66MB   1,267,200            0.06%
54        backbone.features.14.conv.2_Conv2d   576  10  10   160  10  10               92160               0.06MB  18,416,000            1.48%
55   backbone.features.14.conv.3_BatchNorm2d   160  10  10   160  10  10                 320               0.06MB      64,000            0.06%
56               backbone.features.15.conv.0   160  10  10   960  10  10              155520               1.10MB  31,104,000            2.87%
57               backbone.features.15.conv.1   960  10  10   960  10  10               10560               1.10MB   2,112,000            0.13%
58        backbone.features.15.conv.2_Conv2d   960  10  10   160  10  10              153600               0.06MB  30,704,000            2.30%
59   backbone.features.15.conv.3_BatchNorm2d   160  10  10   160  10  10                 320               0.06MB      64,000            0.03%
60               backbone.features.16.conv.0   160  10  10   960  10  10              155520               1.10MB  31,104,000            0.07%
61               backbone.features.16.conv.1   960  10  10   960  10  10               10560               1.10MB   2,112,000            0.52%
62        backbone.features.16.conv.2_Conv2d   960  10  10   160  10  10              153600               0.06MB  30,704,000            0.02%
63   backbone.features.16.conv.3_BatchNorm2d   160  10  10   160  10  10                 320               0.06MB      64,000            0.02%
64               backbone.features.17.conv.0   160  10  10   960  10  10              155520               1.10MB  31,104,000            0.05%
65               backbone.features.17.conv.1   960  10  10   960  10  10               10560               1.10MB   2,112,000            0.44%
66        backbone.features.17.conv.2_Conv2d   960  10  10   320  10  10              307200               0.12MB  61,408,000            2.60%
67   backbone.features.17.conv.3_BatchNorm2d   320  10  10   320  10  10                 640               0.12MB     128,000            0.03%
68                      backbone.features.18   320  10  10  1280  10  10              412160               1.46MB  82,432,000            3.39%
69                depthnet.features.0.conv.0     1 320 320     1 160 160                  12               0.29MB     588,800            2.08%
70         depthnet.features.0.conv.1_Conv2d     1 160 160    16 160 160                  16               1.56MB     409,600            0.42%
71    depthnet.features.0.conv.2_BatchNorm2d    16 160 160    16 160 160                  32               1.56MB   1,638,400            0.03%
72                depthnet.features.1.conv.0    16 160 160    16 160 160                 192               4.69MB   9,420,800            0.06%
73         depthnet.features.1.conv.1_Conv2d    16 160 160    16 160 160                 256               1.56MB  12,697,600            0.93%
74    depthnet.features.1.conv.2_BatchNorm2d    16 160 160    16 160 160                  32               1.56MB   1,638,400            0.03%
75                depthnet.features.2.conv.0    16 160 160    64 160 160                1216              18.75MB  60,620,800            3.78%
76                depthnet.features.2.conv.1    64 160 160    64  80  80                 768               4.69MB   9,420,800            0.08%
77         depthnet.features.2.conv.2_Conv2d    64  80  80    32  80  80                2048               0.78MB  26,009,600            1.28%
78    depthnet.features.2.conv.3_BatchNorm2d    32  80  80    32  80  80                  64               0.78MB     819,200            1.10%
79                depthnet.features.3.conv.0    32  80  80   128  80  80                4480               9.38MB  56,524,800            1.91%
80                depthnet.features.3.conv.1   128  80  80   128  80  80                1536               9.38MB  18,841,600            0.08%
81         depthnet.features.3.conv.2_Conv2d   128  80  80    32  80  80                4096               0.78MB  52,224,000            3.76%
82    depthnet.features.3.conv.3_BatchNorm2d    32  80  80    32  80  80                  64               0.78MB     819,200            0.32%
83                depthnet.features.4.conv.0    32  80  80   128  80  80                4480               9.38MB  56,524,800            0.09%
84                depthnet.features.4.conv.1   128  80  80   128  40  40                1536               2.34MB   4,710,400            0.48%
85         depthnet.features.4.conv.2_Conv2d   128  40  40    64  40  40                8192               0.39MB  26,112,000            0.78%
86    depthnet.features.4.conv.3_BatchNorm2d    64  40  40    64  40  40                 128               0.39MB     409,600            0.03%
87                depthnet.features.5.conv.0    64  40  40   256  40  40               17152               4.69MB  54,476,800            1.29%
88                depthnet.features.5.conv.1   256  40  40   256  40  40                3072               4.69MB   9,420,800            0.06%
89         depthnet.features.5.conv.2_Conv2d   256  40  40    64  40  40               16384               0.39MB  52,326,400            1.68%
90    depthnet.features.5.conv.3_BatchNorm2d    64  40  40    64  40  40                 128               0.39MB     409,600            0.04%
91                depthnet.features.6.conv.0    64  40  40   256  40  40               17152               4.69MB  54,476,800            0.09%
92                depthnet.features.6.conv.1   256  40  40   256  20  20                3072               1.17MB   2,355,200            0.06%
93         depthnet.features.6.conv.2_Conv2d   256  20  20    96  20  20               24576               0.15MB  19,622,400            0.56%
94    depthnet.features.6.conv.3_BatchNorm2d    96  20  20    96  20  20                 192               0.15MB     153,600            0.53%
95                depthnet.features.7.conv.0    96  20  20   384  20  20               38016               1.76MB  30,259,200            1.00%
96                depthnet.features.7.conv.1   384  20  20   384  20  20                4608               1.76MB   3,532,800            0.36%
97         depthnet.features.7.conv.2_Conv2d   384  20  20    96  20  20               36864               0.15MB  29,452,800            0.04%
98    depthnet.features.7.conv.3_BatchNorm2d    96  20  20    96  20  20                 192               0.15MB     153,600            0.03%
99                depthnet.features.8.conv.0    96  20  20   384  20  20               38016               1.76MB  30,259,200            0.36%
100               depthnet.features.8.conv.1   384  20  20   384  10  10                4608               0.44MB     883,200            0.07%
101        depthnet.features.8.conv.2_Conv2d   384  10  10   320  10  10              122880               0.12MB  24,544,000            1.88%
102   depthnet.features.8.conv.3_BatchNorm2d   320  10  10   320  10  10                 640               0.12MB     128,000            0.04%
103               depthnet.features.9.conv.0   320  10  10  1280  10  10              413440               1.46MB  82,560,000            0.11%
104               depthnet.features.9.conv.1  1280  10  10  1280  10  10               15360               1.46MB   2,944,000            0.36%
105        depthnet.features.9.conv.2_Conv2d  1280  10  10   320  10  10              409600               0.12MB  81,888,000            7.51%
106   depthnet.features.9.conv.3_BatchNorm2d   320  10  10   320  10  10                 640               0.12MB     128,000            0.08%
107                depth_fuse.d_conv1.conv.0   320  10  10  1280  10  10              413440               1.46MB  82,560,000            0.71%
108                depth_fuse.d_conv1.conv.1  1280  10  10  1280  10  10               15360               1.46MB   2,944,000            0.11%
109         depth_fuse.d_conv1.conv.2_Conv2d  1280  10  10   320  10  10              409600               0.12MB  81,888,000            0.05%
110    depth_fuse.d_conv1.conv.3_BatchNorm2d   320  10  10   320  10  10                 640               0.12MB     128,000            0.04%
111                      depth_fuse.d_linear           320           320              205440               0.00MB     409,280            1.07%
112                depth_fuse.d_conv2.conv.0   320  10  10  1280  10  10              413440               1.46MB  82,560,000            0.56%
113                depth_fuse.d_conv2.conv.1  1280  10  10  1280  10  10               15360               1.46MB   2,944,000            0.10%
114         depth_fuse.d_conv2.conv.2_Conv2d  1280  10  10   320  10  10              409600               0.12MB  81,888,000            0.05%
115    depth_fuse.d_conv2.conv.3_BatchNorm2d   320  10  10   320  10  10                 640               0.12MB     128,000            0.04%
116                           fpn.inners_a.0    16 160 160     8 160 160                 152               2.34MB   7,577,600            0.82%
117                           fpn.inners_a.1    24  80  80    12  80  80                 324               0.88MB   4,070,400            0.42%
118                           fpn.inners_a.2    32  40  40    16  40  40                 560               0.29MB   1,766,400            0.41%
119                           fpn.inners_a.3    96  20  20    48  20  20                4752               0.22MB   3,782,400            0.74%
120                           fpn.inners_a.4   320  10  10   320  10  10              103360               0.37MB  20,640,000            1.02%
121                           fpn.inners_b.0    24  80  80     8  80  80                 216               0.59MB   2,713,600            0.36%
122                           fpn.inners_b.1    32  40  40    12  40  40                 420               0.22MB   1,324,800            0.29%
123                           fpn.inners_b.2    96  20  20    16  20  20                1584               0.07MB   1,260,800            0.30%
124                           fpn.inners_b.3   320  10  10    48  10  10               15504               0.05MB   3,096,000            3.49%
125                   fpn.fuse.0.channel_att            16            16                 544               0.00MB       1,008            0.09%
126                  fpn.fuse.0.fuse.0.conv1    16 160 160    64 160 160                1216              18.75MB  60,620,800            0.09%
127    fpn.fuse.0.fuse.0.hidden_conv1_Conv2d    64 160 160    64 160 160                 640               6.25MB  29,491,200            0.02%
128    fpn.fuse.0.fuse.0.hidden_conv2_Conv2d    64 160 160    64 160 160                 640               6.25MB  29,491,200            0.02%
129    fpn.fuse.0.fuse.0.hidden_conv3_Conv2d    64 160 160    64 160 160                 640               6.25MB  29,491,200            0.02%
130           fpn.fuse.0.fuse.0.hidden_bnact    64 160 160    64 160 160                 128              12.50MB   8,192,000            0.04%
131               fpn.fuse.0.fuse.0.out_conv    64 160 160    16 160 160                1056               3.12MB  53,657,600            2.18%
132                        fpn.fuse.0.fuse.1    16 160 160    16 160 160                 304               4.69MB  15,155,200            0.08%
133                   fpn.fuse.1.channel_att            24            24                1200               0.00MB       2,280            0.09%
134                  fpn.fuse.1.fuse.0.conv1    24  80  80    96  80  80                2592               7.03MB  32,563,200            1.28%
135    fpn.fuse.1.fuse.0.hidden_conv1_Conv2d    96  80  80    96  80  80                 960               2.34MB  11,059,200            0.02%
136    fpn.fuse.1.fuse.0.hidden_conv2_Conv2d    96  80  80    96  80  80                 960               2.34MB  11,059,200            0.02%
137    fpn.fuse.1.fuse.0.hidden_conv3_Conv2d    96  80  80    96  80  80                 960               2.34MB  11,059,200            0.02%
138           fpn.fuse.1.fuse.0.hidden_bnact    96  80  80    96  80  80                 192               4.69MB   3,072,000            0.04%
139               fpn.fuse.1.fuse.0.out_conv    96  80  80    24  80  80                2352               1.17MB  29,952,000            0.06%
140                        fpn.fuse.1.fuse.1    24  80  80    24  80  80                 648               1.76MB   8,140,800            0.75%
141                   fpn.fuse.2.channel_att            32            32                2112               0.00MB       4,064            0.10%
142                  fpn.fuse.2.fuse.0.conv1    32  40  40   128  40  40                4480               2.34MB  14,131,200            0.52%
143    fpn.fuse.2.fuse.0.hidden_conv1_Conv2d   128  40  40   128  40  40                1280               0.78MB   3,686,400            0.42%
144    fpn.fuse.2.fuse.0.hidden_conv2_Conv2d   128  40  40   128  40  40                1280               0.78MB   3,686,400            0.02%
145    fpn.fuse.2.fuse.0.hidden_conv3_Conv2d   128  40  40   128  40  40                1280               0.78MB   3,686,400            0.02%
146           fpn.fuse.2.fuse.0.hidden_bnact   128  40  40   128  40  40                 256               1.56MB   1,024,000            0.04%
147               fpn.fuse.2.fuse.0.out_conv   128  40  40    32  40  40                4160               0.39MB  13,260,800            0.62%
148                        fpn.fuse.2.fuse.1    32  40  40    32  40  40                1120               0.59MB   3,532,800            0.34%
149                   fpn.fuse.3.channel_att            96            96               18624               0.00MB      36,768            0.13%
150                  fpn.fuse.3.fuse.0.conv1    96  20  20   384  20  20               38016               1.76MB  30,259,200            0.09%
151    fpn.fuse.3.fuse.0.hidden_conv1_Conv2d   384  20  20   384  20  20                3840               0.59MB   2,764,800            0.45%
152    fpn.fuse.3.fuse.0.hidden_conv2_Conv2d   384  20  20   384  20  20                3840               0.59MB   2,764,800            0.03%
153    fpn.fuse.3.fuse.0.hidden_conv3_Conv2d   384  20  20   384  20  20                3840               0.59MB   2,764,800            0.02%
154           fpn.fuse.3.fuse.0.hidden_bnact   384  20  20   384  20  20                 768               1.17MB     768,000            0.04%
155               fpn.fuse.3.fuse.0.out_conv   384  20  20    96  20  20               37056               0.29MB  29,606,400            0.06%
156                        fpn.fuse.3.fuse.1    96  20  20    96  20  20                9504               0.44MB   7,564,800            0.38%
157                  fpn.fuse.4.fuse.0.conv1   320  10  10  1280  10  10              413440               1.46MB  82,560,000            0.08%
158    fpn.fuse.4.fuse.0.hidden_conv1_Conv2d  1280  10  10  1280  10  10               12800               0.49MB   2,304,000            0.02%
159    fpn.fuse.4.fuse.0.hidden_conv2_Conv2d  1280  10  10  1280  10  10               12800               0.49MB   2,304,000            0.19%
160    fpn.fuse.4.fuse.0.hidden_conv3_Conv2d  1280  10  10  1280  10  10               12800               0.49MB   2,304,000            0.02%
161           fpn.fuse.4.fuse.0.hidden_bnact  1280  10  10  1280  10  10                2560               0.98MB     640,000            0.04%
162               fpn.fuse.4.fuse.0.out_conv  1280  10  10   320  10  10              410240               0.24MB  82,016,000            0.06%
163                        fpn.fuse.4.fuse.1   320  10  10   320  10  10              103360               0.37MB  20,640,000            0.07%
164                              cls1_Conv2d    16 160 160     1 160 160                  17               0.10MB     819,200            0.71%
165                              cls2_Conv2d     0   0   0     0   0   0                   0               0.00MB           0            0.00%
166                              cls3_Conv2d     0   0   0     0   0   0                   0               0.00MB           0            0.00%
167                              cls4_Conv2d     0   0   0     0   0   0                   0               0.00MB           0            0.00%
168                              cls5_Conv2d     0   0   0     0   0   0                   0               0.00MB           0            0.00%
==============================================================================================================================================
total parameters quantity: 6,544,509
total memory: 354.43MB
total MAdd: 3,122,995,800

The number of GFLOPS is ~1/2 of the number of Madd. So the computational cost of the network is 1.56GFLOPS.
(I am sorry that previously I missed the depth branch and get a smaller FLOPS result.)
If there is not any other mistake, use the number as below :)

input size 224x224, 0.76 GFLOPS
input size 320x320, 1.56 GFLOPS

gbliao · Answer 10 · Sat May 07 2022 19:47:56 GMT+0800 (China Standard Time)

Thanks for the explanation of the FPS. Previously we were curious if this setup was a fair comparison to the setup with batch 1 in other papers.
Yes. We agree with you about the FLOPs. We are very grateful for the help you have provided！

gbliao · Answer 11 · Sat May 07 2022 21:26:53 GMT+0800 (China Standard Time)

The number of GFLOPS & FPS for different input sizes is as below:

Input Size GFLOPS PyTorch FPS (bs=20, fp32, RTX 2080Ti)
224 x 224 0.76G 900
320 x 320 1.56G 450
I am also going to replement the speed issues given the specific enviroment (python=3.6, torch=1.5.1+cu101, torchvision=0.6.1+cu101). Please wait in patience :)

When we tested the MobileSal with the batch of 20 again, the aforementioned out-of-memory issue disappeared. I am sorry that previously we may have set something wrong. Thank you again for all your help!

Yu-Huan Wu · Answer 12 · Tue May 10 2022 11:24:04 GMT+0800 (China Standard Time)

Yeah, glad to see that all issues are solved. This issue is to be closed. If you would like to discuss with me with other topics, you could add my we-chat wyh-hys or open a new issue. Thank you so much!