Added FLOPs in our paper (Table 8, arXiv v4).
ku21fan opened this issue · comments
We received some requests about FLOPs of each model, thus we calculated it and updated our paper.
In this issue, we summarize the detail of our FLOPs calculation.
Our FLOPS calculation is approximate value.
Our calculation is mainly based on THOP, which is not an official PyTorch code. (but popular one)
From this issue and readme of THOP, THOP seems like to calculate MACs instead of FLOPs, thus we just use # MACs * 2 as # FLOPs.
We have some irregular modules, which are not in THOP: GridGenerator and LSTM/LSTMCell.
Thus, we calculate FLOPs of GridGenerator module by this code.
def count_GridGenerator(m): # size num_fiducial_point = 20 image_width = 32 image_height = 100 # count calculation # https://arxiv.org/pdf/1904.01906.pdf # we count euclidian distance (d_ij) as 3 MACs, since euclidian distance (d_ij) is root(square(c_i - c_j)) R = num_fiducial_point * num_fiducial_point * 3 * 3 # 3600, 20x20 (size of R), 3 = square, *, ln, 3 = d_ij # we count matrix inversion as N^3 MACs inv_delta_C = (num_fiducial_point + 3) ** 3 # 12167 T = (num_fiducial_point + 3) * (num_fiducial_point + 3) * 2 # 1058 P = image_width * image_height * (num_fiducial_point + 3) * 2 # 147200 total_ops = R + inv_delta_C + T + P # 164025, about 0.164M MACs m.total_ops += torch.Tensor([int(total_ops)])
and calculate FLOPs of LSTM by this code.
def count_LSTM(m, x, y): # size input_size = x.size(-1) hidden_state_size = y.size(-1) # = output_size cell_state_size = y.size(-1) # = output_size # count calculation https://pytorch.org/docs/stable/nn.html#torch.nn.LSTM # count sigmoid/tanh activation function as 0 MACs # 3*hidden_state_size = count addition operation. input_gate = input_size * hidden_state_size + hidden_state_size * hidden_state_size \ + 3 * hidden_state_size forget_gate = input_size * hidden_state_size + hidden_state_size * hidden_state_size \ + 3 * hidden_state_size cell_gate = input_size * hidden_state_size + hidden_state_size * hidden_state_size \ + 3 * hidden_state_size output_gate = input_size * hidden_state_size + hidden_state_size * hidden_state_size \ + 3 * hidden_state_size update_cell_state = hidden_state_size + hidden_state_size + hidden_state_size update_hidden_state = hidden_state_size total_ops = input_gate + forget_gate + cell_gate + output_gate + update_cell_state + update_hidden_state time_step = x.size(-2) m.total_ops += torch.Tensor([int(total_ops)]) * time_step
We attached our modified profile code of THOP and we simply use the below code to calculate FLOPs.
import torch import model from thop import profile device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') input = torch.randn(1, 1, 32, 100).to(device) text_for_pred = torch.LongTensor(1, opt.batch_max_length + 1).fill_(0).to(device) model_ = model.Model(opt).to(device) MACs, params = profile(model_, inputs=(input, text_for_pred, )) flops = 2 * MACs # approximate FLOPs
If you found some issues, please let us know.
According in your paper is it the number of Floating point operations (FLOPs ) not the Floating point operations per second (FLOPS) ?
You are right. It is FLOPs rather than FLOPS.
That was my mistake.
Thank you for the comment :)