JingyunLiang / SwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page:https://arxiv.org/abs/2108.10257

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About the flops of SwinIR model

hcy96 opened this issue · comments

commented

Hi, thanks for your great work!

I'm confused about the MACs and Flops of SwinIR.

Do you calculate the MACs not Flops in the flops() function in network_swinir.py?

BTW, is there a bug in line848 in network_swinir.py?

commented

image

I also noticed this problem and it brought me some trouble. Strictly speaking,MACs, i.e., Multi_Adds should be equal to Flops/2. So it actually caculates MACs, and the method name in the code is misleading.
What confuses me most is , the paper says MACs is evaluated on a 1280 x 720 image, but it is actually obtained based on 1024 x 720, which is implied in the code.

I also noticed this problem and it brought me some trouble. Strictly speaking,MACs, i.e., Multi_Adds should be equal to Flops/2. So it actually caculates MACs, and the method name in the code is misleading. What confuses me most is , the paper says MACs is evaluated on a 1280 x 720 image, but it is actually obtained based on 1024 x 720, which is implied in the code.

i was confused with it also, but after seeing this comment I found 1024x720 issue. Thank you for your comment. I think it should be corrected for fair comparisons in both this paper and other later future papers. it is very important factor for lightweight SISR! but in arxiv or iccv, there is no edited paper.
please check comments, and revise it.
additionally, the number of parameters recorded in the paper omitted "relative_position_bias_table". correct #params and #Mult-Adds information are as followed:
(note: LQ input resolutions for Mult-Adds follow main_test_swinir.py)

#Mult-Adds
(scale: in paper -> in fact correctness)
x2: 195.6G -> 243.7G (evaluated on 648x368 LQ)
x3: 87.2G -> 109.5G (evaluated on 432x248 LQ)
x4: 49.6G -> 61.7G (evaluated on 328x184 LQ)
(greatly huge misleading!, especially for x2)

#params
(scale: in paper -> in fact correctness)
x2: 877,752 -> 910,152
x3: 885,867 -> 918,267
x4: 897,228 -> 929,628

I also noticed this problem and it brought me some trouble. Strictly speaking,MACs, i.e., Multi_Adds should be equal to Flops/2. So it actually caculates MACs, and the method name in the code is misleading. What confuses me most is , the paper says MACs is evaluated on a 1280 x 720 image, but it is actually obtained based on 1024 x 720, which is implied in the code.

i was confused with it also, but after seeing this comment I found 1024x720 issue. Thank you for your comment. I think it should be corrected for fair comparisons in both this paper and other later future papers. it is very important factor for lightweight SISR! but in arxiv or iccv, there is no edited paper. please check comments, and revise it. additionally, the number of parameters recorded in the paper omitted "relative_position_bias_table". correct #params and #Mult-Adds information are as followed: (note: LQ input resolutions for Mult-Adds follow main_test_swinir.py)

#Mult-Adds (scale: in paper -> in fact correctness) x2: 195.6G -> 243.7G (evaluated on 648x368 LQ) x3: 87.2G -> 109.5G (evaluated on 432x248 LQ) x4: 49.6G -> 61.7G (evaluated on 328x184 LQ) (greatly huge misleading!, especially for x2)

#params (scale: in paper -> in fact correctness) x2: 877,752 -> 910,152 x3: 885,867 -> 918,267 x4: 897,228 -> 929,628

why I found that the multi-adds are not that large?

Totals
Total params 897.228k
Trainable params 897.228k
Non-trainable params 0.0
Mult-Adds 11.43981228G

this is x4 scale calculated by torchsummaryX

I also noticed this problem and it brought me some trouble. Strictly speaking,MACs, i.e., Multi_Adds should be equal to Flops/2. So it actually caculates MACs, and the method name in the code is misleading. What confuses me most is , the paper says MACs is evaluated on a 1280 x 720 image, but it is actually obtained based on 1024 x 720, which is implied in the code.

i was confused with it also, but after seeing this comment I found 1024x720 issue. Thank you for your comment. I think it should be corrected for fair comparisons in both this paper and other later future papers. it is very important factor for lightweight SISR! but in arxiv or iccv, there is no edited paper. please check comments, and revise it. additionally, the number of parameters recorded in the paper omitted "relative_position_bias_table". correct #params and #Mult-Adds information are as followed: (note: LQ input resolutions for Mult-Adds follow main_test_swinir.py)
#Mult-Adds (scale: in paper -> in fact correctness) x2: 195.6G -> 243.7G (evaluated on 648x368 LQ) x3: 87.2G -> 109.5G (evaluated on 432x248 LQ) x4: 49.6G -> 61.7G (evaluated on 328x184 LQ) (greatly huge misleading!, especially for x2)
#params (scale: in paper -> in fact correctness) x2: 877,752 -> 910,152 x3: 885,867 -> 918,267 x4: 897,228 -> 929,628

why I found that the multi-adds are not that large?

Totals Total params 897.228k Trainable params 897.228k Non-trainable params 0.0 Mult-Adds 11.43981228G

this is x4 scale calculated by torchsummaryX

I don't remember really well because my calculation was somewhat dated.
But as I remember, torchsummaryX you mentioned is not well operating on self-attention mechanism.

Actually in computing self-attention, we multiply Q and K, then multiply QK result to V.
It takes somewhat much computation. Think that the resolution and channels of Q, K, V matrix are large.
But as i know, that package cannot consider this matrix multiplication.

So, it is better to manually implement flops function for each module in SwinIR, especially for window self-attention part.

And I have not used that package for calculating the number of parameters.
But in my case I always use model.named_parameters(). (model is torch.nn.Module class)
Because this method may give you all of the learnable parameters, it is the most accurate way to count the number of params.

Thank you.

I also noticed this problem and it brought me some trouble. Strictly speaking,MACs, i.e., Multi_Adds should be equal to Flops/2. So it actually caculates MACs, and the method name in the code is misleading. What confuses me most is , the paper says MACs is evaluated on a 1280 x 720 image, but it is actually obtained based on 1024 x 720, which is implied in the code.

i was confused with it also, but after seeing this comment I found 1024x720 issue. Thank you for your comment. I think it should be corrected for fair comparisons in both this paper and other later future papers. it is very important factor for lightweight SISR! but in arxiv or iccv, there is no edited paper. please check comments, and revise it. additionally, the number of parameters recorded in the paper omitted "relative_position_bias_table". correct #params and #Mult-Adds information are as followed: (note: LQ input resolutions for Mult-Adds follow main_test_swinir.py)
#Mult-Adds (scale: in paper -> in fact correctness) x2: 195.6G -> 243.7G (evaluated on 648x368 LQ) x3: 87.2G -> 109.5G (evaluated on 432x248 LQ) x4: 49.6G -> 61.7G (evaluated on 328x184 LQ) (greatly huge misleading!, especially for x2)
#params (scale: in paper -> in fact correctness) x2: 877,752 -> 910,152 x3: 885,867 -> 918,267 x4: 897,228 -> 929,628

why I found that the multi-adds are not that large?
Totals Total params 897.228k Trainable params 897.228k Non-trainable params 0.0 Mult-Adds 11.43981228G
this is x4 scale calculated by torchsummaryX

I don't remember really well because my calculation was somewhat dated. But as I remember, torchsummaryX you mentioned is not well operating on self-attention mechanism.

Actually in computing self-attention, we multiply Q and K, then multiply QK result to V. It takes somewhat much computation. Think that the resolution and channels of Q, K, V matrix are large. But as i know, that package cannot consider this matrix multiplication.

So, it is better to manually implement flops function for each module in SwinIR, especially for window self-attention part.

And I have not used that package for calculating the number of parameters. But in my case I always use model.named_parameters(). (model is torch.nn.Module class) Because this method may give you all of the learnable parameters, it is the most accurate way to count the number of params.

Thank you.

Thanks for your quick and kind reply!
I also noticed that some of the popular params summary libraries doesn't count MSA while calculating the MACs.
It seems that ptflops library can calculate the MSA. But the results are a little smaller than yours. :)

Computational complexity: 52.05 GMac
Number of parameters: 929.63 k

I also noticed this problem and it brought me some trouble. Strictly speaking,MACs, i.e., Multi_Adds should be equal to Flops/2. So it actually caculates MACs, and the method name in the code is misleading. What confuses me most is , the paper says MACs is evaluated on a 1280 x 720 image, but it is actually obtained based on 1024 x 720, which is implied in the code.

i was confused with it also, but after seeing this comment I found 1024x720 issue. Thank you for your comment. I think it should be corrected for fair comparisons in both this paper and other later future papers. it is very important factor for lightweight SISR! but in arxiv or iccv, there is no edited paper. please check comments, and revise it. additionally, the number of parameters recorded in the paper omitted "relative_position_bias_table". correct #params and #Mult-Adds information are as followed: (note: LQ input resolutions for Mult-Adds follow main_test_swinir.py)
#Mult-Adds (scale: in paper -> in fact correctness) x2: 195.6G -> 243.7G (evaluated on 648x368 LQ) x3: 87.2G -> 109.5G (evaluated on 432x248 LQ) x4: 49.6G -> 61.7G (evaluated on 328x184 LQ) (greatly huge misleading!, especially for x2)
#params (scale: in paper -> in fact correctness) x2: 877,752 -> 910,152 x3: 885,867 -> 918,267 x4: 897,228 -> 929,628

why I found that the multi-adds are not that large?
Totals Total params 897.228k Trainable params 897.228k Non-trainable params 0.0 Mult-Adds 11.43981228G
this is x4 scale calculated by torchsummaryX

I don't remember really well because my calculation was somewhat dated. But as I remember, torchsummaryX you mentioned is not well operating on self-attention mechanism.
Actually in computing self-attention, we multiply Q and K, then multiply QK result to V. It takes somewhat much computation. Think that the resolution and channels of Q, K, V matrix are large. But as i know, that package cannot consider this matrix multiplication.
So, it is better to manually implement flops function for each module in SwinIR, especially for window self-attention part.
And I have not used that package for calculating the number of parameters. But in my case I always use model.named_parameters(). (model is torch.nn.Module class) Because this method may give you all of the learnable parameters, it is the most accurate way to count the number of params.
Thank you.

Thanks for your quick and kind reply! I also noticed that some of the popular params summary libraries doesn't count MSA while calculating the MACs. It seems that ptflops library can calculate the MSA. But the results are a little smaller than yours. :)

Computational complexity: 52.05 GMac Number of parameters: 929.63 k

Thanks to your reply, I can remember what I did before for accurate calculation.
As I tried same things you did with ptflops before, I found that Q, K, V matrix multiplications were not considered as i mentioned above.

ptflops package operates well on the codes implemented with nn.Module forward functions (eg. nn.Linear, nn.Conv2d, nn.TransposedConv2d, ...)
But it cannot compute the number of operations that are implemented with general matrix multiplication.
like

F.softmax(Q @ K.transpose(-1,-2)) @ V

So I recommend you implement the functions that count real multiplication operations.

Good luck!