zezhishao / STEP

Code for our SIGKDD'22 paper Pre-training-Enhanced Spatial-Temporal Graph Neural Network For Multivariate Time Series Forecasting.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU memory issue

KL4805 opened this issue · comments

Dear author,

Thanks for your opensource code!

I currently try to run

python step/run.py --cfg='step/STEP_METR-LA.py' --gpus='0,1'

on two V100 GPUs (with 32GB memory). However, the program fails to run and reported memory overflow. I don't think that this should be the correct case. Could you please tell me what is the normal GPU memory usage? Thanks.

commented

Thanks for your attention~

I just tested the latest version of STEP with two V100 GPUs (32GB memory), and the program seems to work fine.

Here is the nvidia-smi:
image

Thanks for the quick reply. What is the batch size used (specified in STEP_METR-LA.py)?

I used batch size =8, and the nvidia-smi tells me that on each V100, 22517MB is used.
image

commented

That's strange. I directly use the code cloned from GitHub without any modification, i.e. batch_size is set to 32.

I will first try to figure out the problem myself. Thanks for the reference gpu usage given.

commented

You're welcome. In addition, my PyTorch version is 1.10.0 with CUDA 11.1.

It turns out that I mistakenly setup something in loading the pretrained transformer. After I fixed it, the memory becomes fine.

Thank you for your help.

Could you tell me what's you have done to make the memory become fine? What setup you have set? Thank you very much.