Cannot train on GPU
ginacode opened this issue · comments
When I run the progan using pytorch for GPU, I get:
Starting the training process ...
Currently working on Depth: 0
Current resolution: 4 x 4
Epoch: 1
Traceback (most recent call last):
File "progan.py", line 39, in <module>
feedback_factor=2
File "/scratch2/virtualenv/lib/python3.7/site-packages/pro_gan_pytorch/PRO_GAN.py", line 1046, in train
labels, current_depth, alpha)
File "/scratch2/virtualenv/lib/python3.7/site-packages/pro_gan_pytorch/PRO_GAN.py", line 865, in optimize_discriminator
labels, depth, alpha)
File "/scratch2/virtualenv/lib/python3.7/site-packages/pro_gan_pytorch/Losses.py", line 345, in dis_loss
fake_out = self.dis(fake_samps, labels, height, alpha)
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/scratch2/virtualenv/lib/python3.7/site-packages/pro_gan_pytorch/PRO_GAN.py", line 305, in forward
out = self.final_block(y, labels)
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/scratch2/virtualenv/lib/python3.7/site-packages/pro_gan_pytorch/CustomLayers.py", line 445, in forward
labels = self.label_embedder(labels) # [B x C]
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 117, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/scratch2/virtualenv/lib/python3.7/site-packages/torch/nn/functional.py", line 1506, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: diff_view_meta->output_nr_ == 0 ASSERT FAILED at /pytorch/torch/csrc/autograd/variable.cpp:209, please report a bug to PyTorch.
But when I run it using pytorch for CPU only, it works but works very, very slowly. Any idea what could be causing this and is there any way I can work with GPU support?
This is the code I am using, by the way. I am trying to train on 1024x512 images.
import torch as th
import pro_gan_pytorch.PRO_GAN as pg
import matplotlib.pyplot as plt
import os
from torchvision import datasets, transforms
from PIL import Image, ImageChops
device = th.device("cuda" if th.cuda.is_available() else "cpu")
def setup_data():
dataset = datasets.ImageFolder(
root = 'total_intensity/',
transform = transforms.Compose([
transforms.Resize((512,512)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]))
return dataset
if __name__ == '__main__':
depth = 8
num_epochs = [50, 50, 50, 50, 50, 50, 50, 50]
fade_ins = [50, 50, 50, 50, 50, 50, 50, 50]
batch_sizes = [32, 32, 32, 32, 32, 32, 32, 32]
latent_size = 512
dataset = setup_data()
pro_gan = pg.ConditionalProGAN(num_classes=1, depth=depth,
latent_size=latent_size, device=device)
pro_gan.train(
dataset=dataset,
epochs=num_epochs,
fade_in_percentage=fade_ins,
batch_sizes=batch_sizes,
feedback_factor=2
)
The network architecture unfortunately doesn't support images of different shapes like 1024 x 512
that you are using. Could you try padding the second dimension to 1024 to get square images with dimension equal to a power of 2 greater than 4?
Please let me know if you have any other problems.
cheers 🍻!
@akanimax
I should be resizing the images to 512 x 512 before I run progan (see setup_data()).