GPU memory

Question

GPU memory

vujadeyoon opened this issue 4 years ago · comments

Dear all,

Thanks for a nice work.
I have a question about used GPU memory when running the code for inference mode.

I compared trainable parameters and GPU memory between the ResNet50 in the torchvision from Facebook and ReXNetV1 for torch tensor whose shape is [1, 3, 1080, 1920]. As a result, the ReXNetV1 has fewer trainable parameters, but it requires more GPU memory compared to the ResNet50.
Please note that the the parameter of the ReXNetV1, width_mult, is 1.0.
The GPU memory usage is checked by the command, nvidia-smi when running the below code.

Model parameter
- ReXNetV1: 4,796,873
- ResNetV50: 25,557,032
GPU memory
- ReXNetV1: 9,723MiB
- ResNetV50: 7,819MiB
Experiment environment
- OS: Linux 16.0.4
- torch version: 1.5.1
- torchvision version: 0.6.1
- GPU: single NVIDIA Titan RTX

Thus, I wonder why the ReXNetV1 requires more memory than the ResNet50. In other words, I wonder which module in the ReXNet V1 seems to use the most memory.
For reference, the code used in the experiment is below.

import torch
import torchvision.models as models
import rexnetv1

# Please select the model to be used in the experiment to measure GPU memory usage. 
# Comment out the model to be unused. (e.g # model = models.resnet50(pretrained=True).to('cuda'))
# Option 1: RexNetV1
model = rexnetv1.ReXNetV1(width_mult=1.0).to('cuda: 0')
model.load_state_dict(torch.load('./rexnetv1_1.0x.pth'))
# Option 2: ResNet50
# model = models.resnet50(pretrained=True).to('cuda: 0')
model.eval()


x = torch.randn([1, 3, 1080, 1920], dtype=torch.float).to('cuda: 0')

for idx in range(100):
    y = model(x)

print('Model params.: {}'.format(sum(p.numel() for p in model.parameters() if p.requires_grad)))

Dongyoon Han · Answer 1 · Thu Jul 09 2020 15:59:43 GMT+0800 (China Standard Time)

Thank you for leaving the issue. I think this issue would come from Swish implementation (there seems to be the same issue in the EfficientNet implementation) and has been overcome by using memory-efficient Swish. We implemented Swish in a fundamental way which could be implemented in a more efficient way.

I will further take look at this issue.

bonlime · Answer 2 · Sat Oct 03 2020 19:29:53 GMT+0800 (China Standard Time)

It's a pretty late answer but for the note - the memory consumption grows dramatically due to the use of DepthWise convolutions. EfficientNet models also have the same problem