deep-learning-with-pytorch / dlwpt-code

Code for the book Deep Learning with PyTorch by Eli Stevens, Luca Antiga, and Thomas Viehmann.

Home Page:https://www.manning.com/books/deep-learning-with-pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU not utilized

alexhuang2020 opened this issue · comments

I have a GTX 1050 Ti, and I checked:
torch.cuda.device_count()
1
torch.cuda.is_available()
True
However nvidis-smi reports 0% GPU usage when I ran "run('training.LunaTrainingApp', '--epochs=1')"

However, when I ran
import torch
a = torch.rand(20000,20000).cuda()
while True:
a += 1
a -= 1
GPU utility goes up to close to 100%.

What could be the problem?
Thanks.

Now I loaded 10,000 samples to Google Colab with GPU, and ran "run('training.LunaTrainingApp', '--epochs=1')"
GPU utility is also 0%.

The code is wrong in the function "initModel" in training.py.
Current code:

 ```
def initModel(self):
    model = LunaModel()
    if self.use_cuda:
        log.info("Using CUDA; {} devices.".format(torch.cuda.device_count()))
        if torch.cuda.device_count() > 1:
            model = nn.DataParallel(model)  #!  Yeah this doesn't work. Stays on the CPUs.
        model = model.to(self.device) #Not exactly sure what this is doing.
    return model
```

Needs to be:

 ```
def initModel(self):
    model = LunaModel()
    if self.use_cuda:
        log.info("Using CUDA; {} devices.".format(torch.cuda.device_count()))
        model = model.to(self.device)  #Transfer to multiple devices must come from a GPU.
        if torch.cuda.device_count() > 1:
            model = nn.DataParallel(model, device_ids=range(torch.cuda.device_count()))   #Need the device list!
    return model
 ```