Question about the place of checkpoint (shared memory allocation)

Question

Question about the place of checkpoint (shared memory allocation)

mert-kurttutan opened this issue 2 years ago · comments

Hi,

In the file model/densenet.py, the following piece of code to make densenet memory efficient

    def forward(self, *prev_features):
        bn_function = _bn_function_factory(self.norm1, self.relu1, self.conv1)
        if self.efficient and any(prev_feature.requires_grad for prev_feature in prev_features):
            bottleneck_output = cp.checkpoint(bn_function, *prev_features)
        else:
            bottleneck_output = bn_function(*prev_features)
        new_features = self.conv2(self.relu2(self.norm2(bottleneck_output)))
        if self.drop_rate > 0:
            new_features = F.dropout(new_features, p=self.drop_rate, training=self.training)
        return new_features

Above, the checkpoint is applied only to first batch-conv block - and not to the second. This is because the features that are reused by deeper layers come from only the first batch-conv block, right? Features of second layer stored in memory do not lead to quadratic space complexity in number of layers.

Geoff Pleiss · Answer 1 · Tue Sep 06 2022 21:37:32 GMT+0800 (China Standard Time)

This is because the features that are reused by deeper layers come from only the first batch-conv block, right? Features of second layer stored in memory do not lead to quadratic space complexity in number of layers.

This is correct.