Slow training time (can be fixed)

Question

Slow training time (can be fixed)

FabianIsensee opened this issue 6 months ago · comments

Hi Jun,

awesome work! While playing with your repo I noticed that training times are WAY slower than they should be. When using the regular nnUNetTrainer, an epoch of Hippocampus takes 22s instead of 7-8 (on RTX 4090) even though none of the Mamba stuff should be involved.

I traced this back to the way you install pytorch. I recommend you change the instructions to
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
(taken straight from the pytorch website. cuda 11.8 is important as it won't work with cuda 12 due to the causal conv repo)

This has the following effect for me:

regular nnUNetTrainer on Hippocampus goes from 22s -> 7.5s per epoch
nnUNetTrainerUMambaEnc goes from >60s to 24s

Note that I only verified that trainings are running, please make sure everything works fine before changing that :-)

Best,
Fabian

Jun · Answer 1 · Fri Jan 12 2024 03:15:21 GMT+0800 (China Standard Time)

Hi @FabianIsensee ,

Happy New Year!

Thank you so much for the valuable comments. We will do a thorough evaluation under the new environment before making the update.

Best regards,
Jun

Zhongyi Shui · Answer 2 · Tue Jan 30 2024 20:19:48 GMT+0800 (China Standard Time)

any update?

cbx · Answer 3 · Sun Feb 04 2024 09:09:08 GMT+0800 (China Standard Time)

I am looking forward to the update. Currently it is very slow, takes 5 times more time on almost any training task (BTCV, ACDC, Synapse etc), regardless of 2d or 3d_fullres