GPU not utilized

Question

GPU not utilized

alexhuang2020 opened this issue 3 years ago · comments

alexhuang2020 commented 3 years ago

I have a GTX 1050 Ti, and I checked:
torch.cuda.device_count()
1
torch.cuda.is_available()
True
However nvidis-smi reports 0% GPU usage when I ran "run('training.LunaTrainingApp', '--epochs=1')"

However, when I ran
import torch
a = torch.rand(20000,20000).cuda()
while True:
a += 1
a -= 1
GPU utility goes up to close to 100%.

What could be the problem?
Thanks.

alexhuang2020 · Answer 1 · Thu Mar 11 2021 07:10:08 GMT+0800 (China Standard Time)

Now I loaded 10,000 samples to Google Colab with GPU, and ran "run('training.LunaTrainingApp', '--epochs=1')"
GPU utility is also 0%.

sfleisch · Answer 2 · Thu Dec 23 2021 12:17:25 GMT+0800 (China Standard Time)

The code is wrong in the function "initModel" in training.py.
Current code:

 ```
def initModel(self):
    model = LunaModel()
    if self.use_cuda:
        log.info("Using CUDA; {} devices.".format(torch.cuda.device_count()))
        if torch.cuda.device_count() > 1:
            model = nn.DataParallel(model)  #!  Yeah this doesn't work. Stays on the CPUs.
        model = model.to(self.device) #Not exactly sure what this is doing.
    return model
```

Needs to be:

 ```
def initModel(self):
    model = LunaModel()
    if self.use_cuda:
        log.info("Using CUDA; {} devices.".format(torch.cuda.device_count()))
        model = model.to(self.device)  #Transfer to multiple devices must come from a GPU.
        if torch.cuda.device_count() > 1:
            model = nn.DataParallel(model, device_ids=range(torch.cuda.device_count()))   #Need the device list!
    return model
 ```