gpleiss / efficient_densenet_pytorch

A memory-efficient implementation of DenseNets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the place of checkpoint (shared memory allocation)

mert-kurttutan opened this issue · comments

Hi,

In the file model/densenet.py, the following piece of code to make densenet memory efficient

    def forward(self, *prev_features):
        bn_function = _bn_function_factory(self.norm1, self.relu1, self.conv1)
        if self.efficient and any(prev_feature.requires_grad for prev_feature in prev_features):
            bottleneck_output = cp.checkpoint(bn_function, *prev_features)
        else:
            bottleneck_output = bn_function(*prev_features)
        new_features = self.conv2(self.relu2(self.norm2(bottleneck_output)))
        if self.drop_rate > 0:
            new_features = F.dropout(new_features, p=self.drop_rate, training=self.training)
        return new_features

Above, the checkpoint is applied only to first batch-conv block - and not to the second. This is because the features that are reused by deeper layers come from only the first batch-conv block, right? Features of second layer stored in memory do not lead to quadratic space complexity in number of layers.

This is because the features that are reused by deeper layers come from only the first batch-conv block, right? Features of second layer stored in memory do not lead to quadratic space complexity in number of layers.

This is correct.