clovaai / rexnet

Official Pytorch implementation of ReXNet (Rank eXpansion Network) with pretrained models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU memory

vujadeyoon opened this issue · comments

Dear all,

Thanks for a nice work.
I have a question about used GPU memory when running the code for inference mode.

I compared trainable parameters and GPU memory between the ResNet50 in the torchvision from Facebook and ReXNetV1 for torch tensor whose shape is [1, 3, 1080, 1920]. As a result, the ReXNetV1 has fewer trainable parameters, but it requires more GPU memory compared to the ResNet50.
Please note that the the parameter of the ReXNetV1, width_mult, is 1.0.
The GPU memory usage is checked by the command, nvidia-smi when running the below code.

  • Model parameter

    • ReXNetV1: 4,796,873
    • ResNetV50: 25,557,032
  • GPU memory

    • ReXNetV1: 9,723MiB
    • ResNetV50: 7,819MiB
  • Experiment environment

    • OS: Linux 16.0.4
    • torch version: 1.5.1
    • torchvision version: 0.6.1
    • GPU: single NVIDIA Titan RTX

Thus, I wonder why the ReXNetV1 requires more memory than the ResNet50. In other words, I wonder which module in the ReXNet V1 seems to use the most memory.
For reference, the code used in the experiment is below.

import torch
import torchvision.models as models
import rexnetv1

# Please select the model to be used in the experiment to measure GPU memory usage. 
# Comment out the model to be unused. (e.g # model = models.resnet50(pretrained=True).to('cuda'))
# Option 1: RexNetV1
model = rexnetv1.ReXNetV1(width_mult=1.0).to('cuda: 0')
model.load_state_dict(torch.load('./rexnetv1_1.0x.pth'))
# Option 2: ResNet50
# model = models.resnet50(pretrained=True).to('cuda: 0')
model.eval()


x = torch.randn([1, 3, 1080, 1920], dtype=torch.float).to('cuda: 0')

for idx in range(100):
    y = model(x)

print('Model params.: {}'.format(sum(p.numel() for p in model.parameters() if p.requires_grad)))

Thank you for leaving the issue. I think this issue would come from Swish implementation (there seems to be the same issue in the EfficientNet implementation) and has been overcome by using memory-efficient Swish. We implemented Swish in a fundamental way which could be implemented in a more efficient way.

I will further take look at this issue.

It's a pretty late answer but for the note - the memory consumption grows dramatically due to the use of DepthWise convolutions. EfficientNet models also have the same problem