FPS
gbliao opened this issue · comments
Thanks for your great work! We are very interested in the FPS mentioned in the paper. However, after running the code in speed_test.py, we have the following confusion.
(1) The paper mentions using a single 2080Ti GPU to test and get a 450 fps speed, but actually running the self-defined tensor with a batch of 20 in speed_test.py on a single 2080Ti will out of memory. So, how can we get the 450fps on a single 2080Ti GPU?
(2) We found that a self-defined tensor is used for inference speed testing in speed_test.py, which is different from the process in test.py that uses an actual image. The comparison is perhaps unfair. So we are very curious what the exact FPS result would be in test.py with a batch of 1. And what about including the ‘’torch.cuda.synchronize()‘’?
Moreover, can you provide the FLOPs of the MobileSal model? We think this result makes the comparison more comprehensive. Thank you very much!
Thanks for your timely reply! Our running environment:
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Linux version 4.15.0-45-generic (buildd@lcy01-amd64-027) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu116.04.10)) #4816.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019
python 3.6.7
torch 1.5.1+cu101
torch-nightly 1.0.0.dev20181123
torchstat 0.0.7
torchvision 0.6.1+cu101
I have tested the GFLOPS of MobileSal a long time ago. Given the input of size 224×224, it is about 0.4GFlops. Since I am currently on a vacation, I will test mobileSal again with the precise number of GFLOPS in the next week. Thank you for your patience!
…
在 2022年4月30日,11:26,gbliao @.***> 写道: Moreover, can you provide the FLOPs of the MobileSal model? Thank you very much! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.
Thanks! We are curious about the GFLOPs with the input of 320x320 as mentioned in the paper. And we can discuss the above issues together after the holidays. Have a nice holiday!
The number of GFLOPS & FPS for different input sizes is as below:
Input Size | GFLOPS | PyTorch FPS (bs=20, fp32, RTX 2080Ti) |
---|---|---|
224 x 224 | 0.76G | 900 |
320 x 320 | 1.56G | 450 |
I am also going to replement the speed issues given the specific enviroment (python=3.6, torch=1.5.1+cu101, torchvision=0.6.1+cu101). Please wait in patience :)
Thanks for your reply. We have tried testing FLOPs with the input of 320 x 320 by using thop.profile and the results are as follows.
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU6'>.
[WARN] Cannot find rule for <class 'MobileNetV2.ConvBNReLU'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torch.nn.modules.container.Sequential'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'MobileNetV2.InvertedResidual'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'MobileNetV2.MobileNetV2'>. Treat it as zero Macs and zero Params.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU'>.
[WARN] Cannot find rule for <class '__main__.ConvBNReLU'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.InvertedResidual'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.DepthNet'>. Treat it as zero Macs and zero Params.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[WARN] Cannot find rule for <class '__main__.DepthFuseNet'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torch.nn.modules.container.ModuleList'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.IDR'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torch.nn.modules.activation.Sigmoid'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.CPR'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.Fusion'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.CPRDecoder'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class '__main__.MobileSal'>. Treat it as zero Macs and zero Params.
FLOPs: 1.988173288 G
Moreover, we are very curious what the exact FPS result would be in test.py with a batch of 1. We set the code in speed_test.py as follows, is this the right setting?
x = torch.randn(1,3,320,320).cuda()
y = torch.randn(1,1,320,320).cuda()
######################################
#### PyTorch Test [BatchSize 1] #####
######################################
for i in tqdm(range(50)):
# warm up
p = model(x,y)
total_t = 0
for i in tqdm(range(100)):
start = time()
p = model(x,y)
total_t += time() - start
print("FPS", 100 / total_t * 1)
However, we only achieve 54~60 FPS under this setting.
If you use PyTorch backend with batchsize 20, your gpu will have a high GPU utility rate (nvidia-smi), while the GPU utility rate (nvidia-smi) becomes ~20% if you use PyTorch backend with batchsize 1. This behavior does not appear in methods with regular backbones.
So, you need to increase the batchsize to let the GPU utility rate raise to a favorable rate. A test with pytorch backend and batchsize 20 can achieve 450fps. If you use TensorRT backend, you can get a comparable speed with batchsize 1.
On the other hand, the log you showed me seems to reveal that your code is computing MAdd, which is about twice the FLOPS. Meanwhile, the computational cost of some ops seem not to be computed (see the warnings in the log). So you are not computing the real number of FLOPS. I recommend to use torchsummary
to compute the GFLOPS. torchsummary
can give specfic number of the computational cost for each layer. For convenience, I show the detailed computational cost (Madd) for each layer using torchsummary as below:
loading imagenet pretrained mobilenetv2
loaded imagenet pretrained mobilenetv2
Using GPU: 0
Input size: [3, 320, 320]
Using COMPLEX summary mode:
module name input shape output shape parameter quantity inference memory(MB) MAdd duration percent
0 backbone.features.0 3 320 320 32 160 160 928 9.38MB 47,513,600 3.97%
1 backbone.features.1.conv.0 32 160 160 32 160 160 352 9.38MB 18,022,400 0.11%
2 backbone.features.1.conv.1_Conv2d 32 160 160 16 160 160 512 1.56MB 25,804,800 2.16%
3 backbone.features.1.conv.2_BatchNorm2d 16 160 160 16 160 160 32 1.56MB 1,638,400 0.05%
4 backbone.features.2.conv.0 16 160 160 96 160 160 1728 28.12MB 88,473,600 4.36%
5 backbone.features.2.conv.1 96 160 160 96 80 80 1056 7.03MB 13,516,800 0.10%
6 backbone.features.2.conv.2_Conv2d 96 80 80 24 80 80 2304 0.59MB 29,337,600 2.31%
7 backbone.features.2.conv.3_BatchNorm2d 24 80 80 24 80 80 48 0.59MB 614,400 0.07%
8 backbone.features.3.conv.0 24 80 80 144 80 80 3744 10.55MB 47,923,200 3.07%
9 backbone.features.3.conv.1 144 80 80 144 80 80 1584 10.55MB 20,275,200 0.09%
10 backbone.features.3.conv.2_Conv2d 144 80 80 24 80 80 3456 0.59MB 44,083,200 1.81%
11 backbone.features.3.conv.3_BatchNorm2d 24 80 80 24 80 80 48 0.59MB 614,400 0.06%
12 backbone.features.4.conv.0 24 80 80 144 80 80 3744 10.55MB 47,923,200 0.11%
13 backbone.features.4.conv.1 144 80 80 144 40 40 1584 2.64MB 5,068,800 0.32%
14 backbone.features.4.conv.2_Conv2d 144 40 40 32 40 40 4608 0.20MB 14,694,400 1.25%
15 backbone.features.4.conv.3_BatchNorm2d 32 40 40 32 40 40 64 0.20MB 204,800 0.04%
16 backbone.features.5.conv.0 32 40 40 192 40 40 6528 3.52MB 20,889,600 0.89%
17 backbone.features.5.conv.1 192 40 40 192 40 40 2112 3.52MB 6,758,400 0.07%
18 backbone.features.5.conv.2_Conv2d 192 40 40 32 40 40 6144 0.20MB 19,609,600 0.75%
19 backbone.features.5.conv.3_BatchNorm2d 32 40 40 32 40 40 64 0.20MB 204,800 0.08%
20 backbone.features.6.conv.0 32 40 40 192 40 40 6528 3.52MB 20,889,600 0.14%
21 backbone.features.6.conv.1 192 40 40 192 40 40 2112 3.52MB 6,758,400 0.11%
22 backbone.features.6.conv.2_Conv2d 192 40 40 32 40 40 6144 0.20MB 19,609,600 0.05%
23 backbone.features.6.conv.3_BatchNorm2d 32 40 40 32 40 40 64 0.20MB 204,800 0.04%
24 backbone.features.7.conv.0 32 40 40 192 40 40 6528 3.52MB 20,889,600 0.13%
25 backbone.features.7.conv.1 192 40 40 192 20 20 2112 0.88MB 1,689,600 0.10%
26 backbone.features.7.conv.2_Conv2d 192 20 20 64 20 20 12288 0.10MB 9,804,800 0.48%
27 backbone.features.7.conv.3_BatchNorm2d 64 20 20 64 20 20 128 0.10MB 102,400 0.04%
28 backbone.features.8.conv.0 64 20 20 384 20 20 25344 1.76MB 20,275,200 0.52%
29 backbone.features.8.conv.1 384 20 20 384 20 20 4224 1.76MB 3,379,200 0.47%
30 backbone.features.8.conv.2_Conv2d 384 20 20 64 20 20 24576 0.10MB 19,635,200 0.57%
31 backbone.features.8.conv.3_BatchNorm2d 64 20 20 64 20 20 128 0.10MB 102,400 0.04%
32 backbone.features.9.conv.0 64 20 20 384 20 20 25344 1.76MB 20,275,200 0.30%
33 backbone.features.9.conv.1 384 20 20 384 20 20 4224 1.76MB 3,379,200 0.54%
34 backbone.features.9.conv.2_Conv2d 384 20 20 64 20 20 24576 0.10MB 19,635,200 0.04%
35 backbone.features.9.conv.3_BatchNorm2d 64 20 20 64 20 20 128 0.10MB 102,400 0.03%
36 backbone.features.10.conv.0 64 20 20 384 20 20 25344 1.76MB 20,275,200 0.08%
37 backbone.features.10.conv.1 384 20 20 384 20 20 4224 1.76MB 3,379,200 0.83%
38 backbone.features.10.conv.2_Conv2d 384 20 20 64 20 20 24576 0.10MB 19,635,200 0.03%
39 backbone.features.10.conv.3_BatchNorm2d 64 20 20 64 20 20 128 0.10MB 102,400 0.03%
40 backbone.features.11.conv.0 64 20 20 384 20 20 25344 1.76MB 20,275,200 0.24%
41 backbone.features.11.conv.1 384 20 20 384 20 20 4224 1.76MB 3,379,200 0.06%
42 backbone.features.11.conv.2_Conv2d 384 20 20 96 20 20 36864 0.15MB 29,452,800 0.82%
43 backbone.features.11.conv.3_BatchNorm2d 96 20 20 96 20 20 192 0.15MB 153,600 0.03%
44 backbone.features.12.conv.0 96 20 20 576 20 20 56448 2.64MB 45,158,400 1.33%
45 backbone.features.12.conv.1 576 20 20 576 20 20 6336 2.64MB 5,068,800 0.25%
46 backbone.features.12.conv.2_Conv2d 576 20 20 96 20 20 55296 0.15MB 44,198,400 2.44%
47 backbone.features.12.conv.3_BatchNorm2d 96 20 20 96 20 20 192 0.15MB 153,600 0.03%
48 backbone.features.13.conv.0 96 20 20 576 20 20 56448 2.64MB 45,158,400 0.39%
49 backbone.features.13.conv.1 576 20 20 576 20 20 6336 2.64MB 5,068,800 0.26%
50 backbone.features.13.conv.2_Conv2d 576 20 20 96 20 20 55296 0.15MB 44,198,400 0.03%
51 backbone.features.13.conv.3_BatchNorm2d 96 20 20 96 20 20 192 0.15MB 153,600 0.03%
52 backbone.features.14.conv.0 96 20 20 576 20 20 56448 2.64MB 45,158,400 0.54%
53 backbone.features.14.conv.1 576 20 20 576 10 10 6336 0.66MB 1,267,200 0.06%
54 backbone.features.14.conv.2_Conv2d 576 10 10 160 10 10 92160 0.06MB 18,416,000 1.48%
55 backbone.features.14.conv.3_BatchNorm2d 160 10 10 160 10 10 320 0.06MB 64,000 0.06%
56 backbone.features.15.conv.0 160 10 10 960 10 10 155520 1.10MB 31,104,000 2.87%
57 backbone.features.15.conv.1 960 10 10 960 10 10 10560 1.10MB 2,112,000 0.13%
58 backbone.features.15.conv.2_Conv2d 960 10 10 160 10 10 153600 0.06MB 30,704,000 2.30%
59 backbone.features.15.conv.3_BatchNorm2d 160 10 10 160 10 10 320 0.06MB 64,000 0.03%
60 backbone.features.16.conv.0 160 10 10 960 10 10 155520 1.10MB 31,104,000 0.07%
61 backbone.features.16.conv.1 960 10 10 960 10 10 10560 1.10MB 2,112,000 0.52%
62 backbone.features.16.conv.2_Conv2d 960 10 10 160 10 10 153600 0.06MB 30,704,000 0.02%
63 backbone.features.16.conv.3_BatchNorm2d 160 10 10 160 10 10 320 0.06MB 64,000 0.02%
64 backbone.features.17.conv.0 160 10 10 960 10 10 155520 1.10MB 31,104,000 0.05%
65 backbone.features.17.conv.1 960 10 10 960 10 10 10560 1.10MB 2,112,000 0.44%
66 backbone.features.17.conv.2_Conv2d 960 10 10 320 10 10 307200 0.12MB 61,408,000 2.60%
67 backbone.features.17.conv.3_BatchNorm2d 320 10 10 320 10 10 640 0.12MB 128,000 0.03%
68 backbone.features.18 320 10 10 1280 10 10 412160 1.46MB 82,432,000 3.39%
69 depthnet.features.0.conv.0 1 320 320 1 160 160 12 0.29MB 588,800 2.08%
70 depthnet.features.0.conv.1_Conv2d 1 160 160 16 160 160 16 1.56MB 409,600 0.42%
71 depthnet.features.0.conv.2_BatchNorm2d 16 160 160 16 160 160 32 1.56MB 1,638,400 0.03%
72 depthnet.features.1.conv.0 16 160 160 16 160 160 192 4.69MB 9,420,800 0.06%
73 depthnet.features.1.conv.1_Conv2d 16 160 160 16 160 160 256 1.56MB 12,697,600 0.93%
74 depthnet.features.1.conv.2_BatchNorm2d 16 160 160 16 160 160 32 1.56MB 1,638,400 0.03%
75 depthnet.features.2.conv.0 16 160 160 64 160 160 1216 18.75MB 60,620,800 3.78%
76 depthnet.features.2.conv.1 64 160 160 64 80 80 768 4.69MB 9,420,800 0.08%
77 depthnet.features.2.conv.2_Conv2d 64 80 80 32 80 80 2048 0.78MB 26,009,600 1.28%
78 depthnet.features.2.conv.3_BatchNorm2d 32 80 80 32 80 80 64 0.78MB 819,200 1.10%
79 depthnet.features.3.conv.0 32 80 80 128 80 80 4480 9.38MB 56,524,800 1.91%
80 depthnet.features.3.conv.1 128 80 80 128 80 80 1536 9.38MB 18,841,600 0.08%
81 depthnet.features.3.conv.2_Conv2d 128 80 80 32 80 80 4096 0.78MB 52,224,000 3.76%
82 depthnet.features.3.conv.3_BatchNorm2d 32 80 80 32 80 80 64 0.78MB 819,200 0.32%
83 depthnet.features.4.conv.0 32 80 80 128 80 80 4480 9.38MB 56,524,800 0.09%
84 depthnet.features.4.conv.1 128 80 80 128 40 40 1536 2.34MB 4,710,400 0.48%
85 depthnet.features.4.conv.2_Conv2d 128 40 40 64 40 40 8192 0.39MB 26,112,000 0.78%
86 depthnet.features.4.conv.3_BatchNorm2d 64 40 40 64 40 40 128 0.39MB 409,600 0.03%
87 depthnet.features.5.conv.0 64 40 40 256 40 40 17152 4.69MB 54,476,800 1.29%
88 depthnet.features.5.conv.1 256 40 40 256 40 40 3072 4.69MB 9,420,800 0.06%
89 depthnet.features.5.conv.2_Conv2d 256 40 40 64 40 40 16384 0.39MB 52,326,400 1.68%
90 depthnet.features.5.conv.3_BatchNorm2d 64 40 40 64 40 40 128 0.39MB 409,600 0.04%
91 depthnet.features.6.conv.0 64 40 40 256 40 40 17152 4.69MB 54,476,800 0.09%
92 depthnet.features.6.conv.1 256 40 40 256 20 20 3072 1.17MB 2,355,200 0.06%
93 depthnet.features.6.conv.2_Conv2d 256 20 20 96 20 20 24576 0.15MB 19,622,400 0.56%
94 depthnet.features.6.conv.3_BatchNorm2d 96 20 20 96 20 20 192 0.15MB 153,600 0.53%
95 depthnet.features.7.conv.0 96 20 20 384 20 20 38016 1.76MB 30,259,200 1.00%
96 depthnet.features.7.conv.1 384 20 20 384 20 20 4608 1.76MB 3,532,800 0.36%
97 depthnet.features.7.conv.2_Conv2d 384 20 20 96 20 20 36864 0.15MB 29,452,800 0.04%
98 depthnet.features.7.conv.3_BatchNorm2d 96 20 20 96 20 20 192 0.15MB 153,600 0.03%
99 depthnet.features.8.conv.0 96 20 20 384 20 20 38016 1.76MB 30,259,200 0.36%
100 depthnet.features.8.conv.1 384 20 20 384 10 10 4608 0.44MB 883,200 0.07%
101 depthnet.features.8.conv.2_Conv2d 384 10 10 320 10 10 122880 0.12MB 24,544,000 1.88%
102 depthnet.features.8.conv.3_BatchNorm2d 320 10 10 320 10 10 640 0.12MB 128,000 0.04%
103 depthnet.features.9.conv.0 320 10 10 1280 10 10 413440 1.46MB 82,560,000 0.11%
104 depthnet.features.9.conv.1 1280 10 10 1280 10 10 15360 1.46MB 2,944,000 0.36%
105 depthnet.features.9.conv.2_Conv2d 1280 10 10 320 10 10 409600 0.12MB 81,888,000 7.51%
106 depthnet.features.9.conv.3_BatchNorm2d 320 10 10 320 10 10 640 0.12MB 128,000 0.08%
107 depth_fuse.d_conv1.conv.0 320 10 10 1280 10 10 413440 1.46MB 82,560,000 0.71%
108 depth_fuse.d_conv1.conv.1 1280 10 10 1280 10 10 15360 1.46MB 2,944,000 0.11%
109 depth_fuse.d_conv1.conv.2_Conv2d 1280 10 10 320 10 10 409600 0.12MB 81,888,000 0.05%
110 depth_fuse.d_conv1.conv.3_BatchNorm2d 320 10 10 320 10 10 640 0.12MB 128,000 0.04%
111 depth_fuse.d_linear 320 320 205440 0.00MB 409,280 1.07%
112 depth_fuse.d_conv2.conv.0 320 10 10 1280 10 10 413440 1.46MB 82,560,000 0.56%
113 depth_fuse.d_conv2.conv.1 1280 10 10 1280 10 10 15360 1.46MB 2,944,000 0.10%
114 depth_fuse.d_conv2.conv.2_Conv2d 1280 10 10 320 10 10 409600 0.12MB 81,888,000 0.05%
115 depth_fuse.d_conv2.conv.3_BatchNorm2d 320 10 10 320 10 10 640 0.12MB 128,000 0.04%
116 fpn.inners_a.0 16 160 160 8 160 160 152 2.34MB 7,577,600 0.82%
117 fpn.inners_a.1 24 80 80 12 80 80 324 0.88MB 4,070,400 0.42%
118 fpn.inners_a.2 32 40 40 16 40 40 560 0.29MB 1,766,400 0.41%
119 fpn.inners_a.3 96 20 20 48 20 20 4752 0.22MB 3,782,400 0.74%
120 fpn.inners_a.4 320 10 10 320 10 10 103360 0.37MB 20,640,000 1.02%
121 fpn.inners_b.0 24 80 80 8 80 80 216 0.59MB 2,713,600 0.36%
122 fpn.inners_b.1 32 40 40 12 40 40 420 0.22MB 1,324,800 0.29%
123 fpn.inners_b.2 96 20 20 16 20 20 1584 0.07MB 1,260,800 0.30%
124 fpn.inners_b.3 320 10 10 48 10 10 15504 0.05MB 3,096,000 3.49%
125 fpn.fuse.0.channel_att 16 16 544 0.00MB 1,008 0.09%
126 fpn.fuse.0.fuse.0.conv1 16 160 160 64 160 160 1216 18.75MB 60,620,800 0.09%
127 fpn.fuse.0.fuse.0.hidden_conv1_Conv2d 64 160 160 64 160 160 640 6.25MB 29,491,200 0.02%
128 fpn.fuse.0.fuse.0.hidden_conv2_Conv2d 64 160 160 64 160 160 640 6.25MB 29,491,200 0.02%
129 fpn.fuse.0.fuse.0.hidden_conv3_Conv2d 64 160 160 64 160 160 640 6.25MB 29,491,200 0.02%
130 fpn.fuse.0.fuse.0.hidden_bnact 64 160 160 64 160 160 128 12.50MB 8,192,000 0.04%
131 fpn.fuse.0.fuse.0.out_conv 64 160 160 16 160 160 1056 3.12MB 53,657,600 2.18%
132 fpn.fuse.0.fuse.1 16 160 160 16 160 160 304 4.69MB 15,155,200 0.08%
133 fpn.fuse.1.channel_att 24 24 1200 0.00MB 2,280 0.09%
134 fpn.fuse.1.fuse.0.conv1 24 80 80 96 80 80 2592 7.03MB 32,563,200 1.28%
135 fpn.fuse.1.fuse.0.hidden_conv1_Conv2d 96 80 80 96 80 80 960 2.34MB 11,059,200 0.02%
136 fpn.fuse.1.fuse.0.hidden_conv2_Conv2d 96 80 80 96 80 80 960 2.34MB 11,059,200 0.02%
137 fpn.fuse.1.fuse.0.hidden_conv3_Conv2d 96 80 80 96 80 80 960 2.34MB 11,059,200 0.02%
138 fpn.fuse.1.fuse.0.hidden_bnact 96 80 80 96 80 80 192 4.69MB 3,072,000 0.04%
139 fpn.fuse.1.fuse.0.out_conv 96 80 80 24 80 80 2352 1.17MB 29,952,000 0.06%
140 fpn.fuse.1.fuse.1 24 80 80 24 80 80 648 1.76MB 8,140,800 0.75%
141 fpn.fuse.2.channel_att 32 32 2112 0.00MB 4,064 0.10%
142 fpn.fuse.2.fuse.0.conv1 32 40 40 128 40 40 4480 2.34MB 14,131,200 0.52%
143 fpn.fuse.2.fuse.0.hidden_conv1_Conv2d 128 40 40 128 40 40 1280 0.78MB 3,686,400 0.42%
144 fpn.fuse.2.fuse.0.hidden_conv2_Conv2d 128 40 40 128 40 40 1280 0.78MB 3,686,400 0.02%
145 fpn.fuse.2.fuse.0.hidden_conv3_Conv2d 128 40 40 128 40 40 1280 0.78MB 3,686,400 0.02%
146 fpn.fuse.2.fuse.0.hidden_bnact 128 40 40 128 40 40 256 1.56MB 1,024,000 0.04%
147 fpn.fuse.2.fuse.0.out_conv 128 40 40 32 40 40 4160 0.39MB 13,260,800 0.62%
148 fpn.fuse.2.fuse.1 32 40 40 32 40 40 1120 0.59MB 3,532,800 0.34%
149 fpn.fuse.3.channel_att 96 96 18624 0.00MB 36,768 0.13%
150 fpn.fuse.3.fuse.0.conv1 96 20 20 384 20 20 38016 1.76MB 30,259,200 0.09%
151 fpn.fuse.3.fuse.0.hidden_conv1_Conv2d 384 20 20 384 20 20 3840 0.59MB 2,764,800 0.45%
152 fpn.fuse.3.fuse.0.hidden_conv2_Conv2d 384 20 20 384 20 20 3840 0.59MB 2,764,800 0.03%
153 fpn.fuse.3.fuse.0.hidden_conv3_Conv2d 384 20 20 384 20 20 3840 0.59MB 2,764,800 0.02%
154 fpn.fuse.3.fuse.0.hidden_bnact 384 20 20 384 20 20 768 1.17MB 768,000 0.04%
155 fpn.fuse.3.fuse.0.out_conv 384 20 20 96 20 20 37056 0.29MB 29,606,400 0.06%
156 fpn.fuse.3.fuse.1 96 20 20 96 20 20 9504 0.44MB 7,564,800 0.38%
157 fpn.fuse.4.fuse.0.conv1 320 10 10 1280 10 10 413440 1.46MB 82,560,000 0.08%
158 fpn.fuse.4.fuse.0.hidden_conv1_Conv2d 1280 10 10 1280 10 10 12800 0.49MB 2,304,000 0.02%
159 fpn.fuse.4.fuse.0.hidden_conv2_Conv2d 1280 10 10 1280 10 10 12800 0.49MB 2,304,000 0.19%
160 fpn.fuse.4.fuse.0.hidden_conv3_Conv2d 1280 10 10 1280 10 10 12800 0.49MB 2,304,000 0.02%
161 fpn.fuse.4.fuse.0.hidden_bnact 1280 10 10 1280 10 10 2560 0.98MB 640,000 0.04%
162 fpn.fuse.4.fuse.0.out_conv 1280 10 10 320 10 10 410240 0.24MB 82,016,000 0.06%
163 fpn.fuse.4.fuse.1 320 10 10 320 10 10 103360 0.37MB 20,640,000 0.07%
164 cls1_Conv2d 16 160 160 1 160 160 17 0.10MB 819,200 0.71%
165 cls2_Conv2d 0 0 0 0 0 0 0 0.00MB 0 0.00%
166 cls3_Conv2d 0 0 0 0 0 0 0 0.00MB 0 0.00%
167 cls4_Conv2d 0 0 0 0 0 0 0 0.00MB 0 0.00%
168 cls5_Conv2d 0 0 0 0 0 0 0 0.00MB 0 0.00%
==============================================================================================================================================
total parameters quantity: 6,544,509
total memory: 354.43MB
total MAdd: 3,122,995,800
The number of GFLOPS is ~1/2 of the number of Madd. So the computational cost of the network is 1.56GFLOPS.
(I am sorry that previously I missed the depth branch and get a smaller FLOPS result.)
If there is not any other mistake, use the number as below :)
- input size 224x224, 0.76 GFLOPS
- input size 320x320, 1.56 GFLOPS
-
Thanks for the explanation of the FPS. Previously we were curious if this setup was a fair comparison to the setup with batch 1 in other papers.
-
Yes. We agree with you about the FLOPs. We are very grateful for the help you have provided!
The number of GFLOPS & FPS for different input sizes is as below:
Input Size GFLOPS PyTorch FPS (bs=20, fp32, RTX 2080Ti)
224 x 224 0.76G 900
320 x 320 1.56G 450
I am also going to replement the speed issues given the specific enviroment (python=3.6, torch=1.5.1+cu101, torchvision=0.6.1+cu101). Please wait in patience :)
When we tested the MobileSal with the batch of 20 again, the aforementioned out-of-memory issue disappeared. I am sorry that previously we may have set something wrong. Thank you again for all your help!
Yeah, glad to see that all issues are solved. This issue is to be closed. If you would like to discuss with me with other topics, you could add my we-chat wyh-hys
or open a new issue. Thank you so much!