Unable to proceed, no GPU resources available

Question

Unable to proceed, no GPU resources available

bpm246 opened this issue 3 years ago · comments

We are trying to run the model with our own server, and we have got this error:
RuntimeError: Unable to proceed, no GPU resources available

Peter Albert · Answer 1 · Tue Apr 27 2021 18:45:17 GMT+0800 (China Standard Time)

Hi!
This means deepspeed is unable to find your gpu. Maybe your cuda version doesn't work with pytorch or your drivers are not installed correctly. You should also have pytorch version 1.7.

Run these commands to check if pytorch recognizes your gpu correctly:
import torch
torch.cuda.is_available()
torch.cuda.current_device()
torch.cuda.get_device_name(0)

bpm246 · Answer 2 · Tue Apr 27 2021 19:25:58 GMT+0800 (China Standard Time)

We have run your instructions:
import torch
torch.cuda.is_available()
and the result is False

We have pytorch 1.8.1 and CUDA 10.2. We have tried to install pytorch 1.7.0 but we got this error:
ERROR: Could not find a version that satisfies the requirement torch==1.7.0
ERROR: No matching distribution found for torch==1.7.0

Peter Albert · Answer 3 · Tue Apr 27 2021 21:43:11 GMT+0800 (China Standard Time)

It should also work with pytorch 1.8, you don't need to install pytorch 1.7.
If torch.cuda.is_available() is False, it means that pytorch can't detect your gpu. This propably means that your GPU or your graphics driver doesn't support the cuda version that you are using. Here is a relevant thread:
https://stackoverflow.com/questions/60987997/why-torch-cuda-is-available-returns-false-even-after-installing-pytorch-with