Performance seems to be very low

Question

Performance seems to be very low

scribblepad opened this issue 3 years ago · comments

I'm trying to explore the usage the AutoAlbument for semantic segmentation task with default generated search.yaml.
The custom dataset has around 29000 RGB images and corresponding masks (height x width - 512 x 512). I'm running it on a single A100 GPU. I'm using max batch size of 8, I could fit only so much in memory without OOM errors. I see GPU getting utilized fine, utilization fluctuates between (35%->75%->99%->100%).

Issue

Looks like the approximate time required to complete autoalbument-search seems to be close 5 days (for 20 epochs) based on the output below, which seems to be too high. Is there a better optimized way to obtain augmentation policies generated by AutoAlbument?
Because it's too expensive to run it for 5 continuous days.

Current Output of autoalbument-search:

Segments from search.yaml:

architecture: Unet
encoder_architecture: resnet18
pretrained: true

dataloader:
target: torch.utils.data.DataLoader
batch_size: 8
shuffle: true
num_workers: 16
pin_memory: true
drop_last: true

Ibraheem Hamdi · Answer 1 · Wed Nov 17 2021 17:03:20 GMT+0800 (China Standard Time)

Same here. Trying the cifar10 example and its taking at least 5s/iteration. 390 itr/epoch means like half an hour per epoch. I can only imagine how slow it it will be if I try to use it on my x-ray classification task with 6000 high-res images.

I'm gonna wait just to see the result out of curiosity but otherwise not usable. Going to look into the Faster Auto Augment this is based on or even the older Rand Augment or Auto Augment

I'm trying to explore the usage the AutoAlbument for semantic segmentation task with default generated search.yaml. The custom dataset has around 29000 RGB images and corresponding masks (height x width - 512 x 512). I'm running it on a single A100 GPU. I'm using max batch size of 8, I could fit only so much in memory without OOM errors. I see GPU getting utilized fine, utilization fluctuates between (35%->75%->99%->100%).

Issue

Looks like the approximate time required to complete autoalbument-search seems to be close 5 days (for 20 epochs) based on the output below, which seems to be too high. Is there a better optimized way to obtain augmentation policies generated by AutoAlbument? Because it's too expensive to run it for 5 continuous days.

Current Output of autoalbument-search:

Segments from search.yaml:

architecture: Unet encoder_architecture: resnet18 pretrained: true

dataloader: target: torch.utils.data.DataLoader batch_size: 8 shuffle: true num_workers: 16 pin_memory: true drop_last: true

Ibraheem Hamdi · Answer 2 · Wed Nov 17 2021 18:35:26 GMT+0800 (China Standard Time)

I'm trying to explore the usage the AutoAlbument for semantic segmentation task with default generated search.yaml. The custom dataset has around 29000 RGB images and corresponding masks (height x width - 512 x 512). I'm running it on a single A100 GPU. I'm using max batch size of 8, I could fit only so much in memory without OOM errors. I see GPU getting utilized fine, utilization fluctuates between (35%->75%->99%->100%).

Issue

Looks like the approximate time required to complete autoalbument-search seems to be close 5 days (for 20 epochs) based on the output below, which seems to be too high. Is there a better optimized way to obtain augmentation policies generated by AutoAlbument? Because it's too expensive to run it for 5 continuous days.

Current Output of autoalbument-search:

Segments from search.yaml:

architecture: Unet encoder_architecture: resnet18 pretrained: true

dataloader: target: torch.utils.data.DataLoader batch_size: 8 shuffle: true num_workers: 16 pin_memory: true drop_last: true

Running the same cifar10 example with batch size of 128 on RTX2070 (6GB) takes the same amount of time/iteration as using a 24 GB RTX3090 and a batch size of 640. I think there's something limiting how quickly the iterations happen in their code.