Question about the place of checkpoint (shared memory allocation)
mert-kurttutan opened this issue · comments
mert-kurttutan commented
Hi,
In the file model/densenet.py
, the following piece of code to make densenet memory efficient
def forward(self, *prev_features):
bn_function = _bn_function_factory(self.norm1, self.relu1, self.conv1)
if self.efficient and any(prev_feature.requires_grad for prev_feature in prev_features):
bottleneck_output = cp.checkpoint(bn_function, *prev_features)
else:
bottleneck_output = bn_function(*prev_features)
new_features = self.conv2(self.relu2(self.norm2(bottleneck_output)))
if self.drop_rate > 0:
new_features = F.dropout(new_features, p=self.drop_rate, training=self.training)
return new_features
Above, the checkpoint is applied only to first batch-conv block - and not to the second. This is because the features that are reused by deeper layers come from only the first batch-conv block, right? Features of second layer stored in memory do not lead to quadratic space complexity in number of layers.
Geoff Pleiss commented
This is because the features that are reused by deeper layers come from only the first batch-conv block, right? Features of second layer stored in memory do not lead to quadratic space complexity in number of layers.
This is correct.