sail-sg / iFormer

iFormer: Inception Transformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Repetition

lijain opened this issue · comments

Thank you for your public work, I find it interesting, I am reproducing your structure, but I have a few questions I would like you to answer:

  1. In the last two stages your pool stride is 1, and the avgPool is 3x3, s=1, padding=2? upsample How do you deal with any of them?
  2. The attention of the high-frequency component is several heads
commented

hello, can you tell me how to implement the maxpool branch ? I feel confused about how to keep the size of feature map the same after maxpool operation. What is the configuration of maxpool and the linear layer after it ?
Thank you!

hello, can you tell me how to implement the maxpool branch ? I feel confused about how to keep the size of feature map the same after maxpool operation. What is the configuration of maxpool and the linear layer after it ? Thank you!

Reference:https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html .

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

Shape:

Input: $(N, C, H_{in}, W_{in})$ or $(C, H_{in}, W_{in})$

Output: $(N, C, H_{out}, W_{out})$ or $(C, H_{out}, W_{out})$

where

$$ \begin{gather*} H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]} \times (\text{kernelsize[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor \\ W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]} \times (\text{kernelsize[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor \end{gather*} $$

Usually, we just set kernel_size=2, default stride=kernel_size=2, then we get nn.MaxPool2d(kernel_size=2), and the output will be the HALF of input. So, we can calculate the shape of output as below.

$$ \begin{align*} Out &= \left\lfloor\frac{In + 2 * padding - dilation * (kernelsize - 1) - 1}{stride} + 1\right\rfloor \\ &=\left\lfloor\frac{In + 2 * 0 - 1 * (2 - 1) - 1}{2} + 1\right\rfloor \\ &=\left\lfloor\frac{In - 1 - 1}{2} + 1\right\rfloor \\ &=\left\lfloor\frac{In}{2}\right\rfloor \end{align*} $$

But when we are using the above code and setting the kernel_size=3, stride=1, padding=1, then we get nn.MaxPool2d(kernel_size=3, stride=1, padding=1), and the output will be the SAME as input. So, we can calculate the shape of output as below.

$$ \begin{align*} Out &= \left\lfloor\frac{In + 2 * padding - dilation * (kernelsize - 1) - 1}{stride} + 1\right\rfloor \\ &=\left\lfloor\frac{In + 2 * 1 - 1 * (3 - 1) - 1}{1} + 1\right\rfloor \\ &=\left\lfloor In + 2 - 2 - 1 + 1 \right\rfloor \\ &=\left\lfloor In \right\rfloor \end{align*} $$

@Ga-Lee Amazing!

@lijain We have released the code.