huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Home Page:https://huggingface.co/docs/timm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DropPath Implementation

IsmaelElsharkawi opened this issue · comments

def drop_path(x, drop_prob: float = 0., training: bool = False):

Hi,
I had two questions about the implementation of the dropPath:

  1. Why do we do it per sample, as far as I understand from https://arxiv.org/pdf/1603.09382.pdf, you either take the whole batch or drop it all together with probability p_l, why is it done per sample here?
  2. What is the _div(keep_prob) used for, I can't see that in the equation of the paper as well, can you please clarify the reason behind that?

@IsmaelElsharkawi This sort of question is more appropriate as discussion. Stochastic depth is per sample, not per batch, believe it says 'independently per sample' somewhere in the paper.

The rescale is as per the sort of convoluted eq(5) and explanation, need to rescale because only a fraction of the activations participate in the output.

Thanks a lot for your explanation, and sorry for that, I'll continue this in a discussion thread.