pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Home Page:https://pytorch.org/examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running on Windows

maxbonzulak opened this issue Β· comments

πŸ“š Documentation

I'm trying to get DCGAN running on my Windows machine. It appears that the code may not support windows, but this is not mentioned in the readme. Is there a procedure to get it running on Windows?

@maxbonzulak could you provide more info?

@hudeven Of course.

After cloning this repo and downloading the lsun data, I'm running this command:
python main.py --dataset lsun --dataroot lsun --cuda --ngpu 2

That is resulting in this error:

Namespace(dataset='lsun', dataroot='lsun', workers=2, batchSize=64, imageSize=64, nz=100, ngf=64, ndf=64, niter=25, lr=0.0002, beta1=0.5, cuda=True, dry_run=False, ngpu=2, netG='', netD='', outf='.', manualSeed=None, classes='bedroom')
Random Seed:  177
Generator(
  (main): Sequential(
    (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): ReLU(inplace=True)
    (12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (13): Tanh()
  )
)
Discriminator(
  (main): Sequential(
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2, inplace=True)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2, inplace=True)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2, inplace=True)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (12): Sigmoid()
  )
)
Traceback (most recent call last):
  File "C:\Users\Windows\Documents\examples\dcgan\main.py", line 219, in <module>
    for i, data in enumerate(dataloader, 0):
  File "C:\Users\Windows\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 444, in __iter__
    return self._get_iterator()
  File "C:\Users\Windows\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\Windows\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in __init__
    w.start()
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Environment' object
PS C:\Users\Windows\Documents\examples\dcgan> Namespace(dataset='lsun', dataroot='lsun', workers=2, batchSize=64, imageSize=64, nz=100, ngf=64, ndf=64, niter=25, lr=0.0002, beta1=0.5, cuda=True, dry_run=False, ngpu=2, netG='', netD='', outf='.', manualSeed=None, classes='bedroom')
Random Seed:  7161
Generator(
  (main): Sequential(
    (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): ReLU(inplace=True)
    (12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (13): Tanh()
  )
)
Discriminator(
  (main): Sequential(
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2, inplace=True)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2, inplace=True)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2, inplace=True)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (12): Sigmoid()
  )
)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\Windows\anaconda3\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\Windows\anaconda3\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Windows\anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Windows\Documents\examples\dcgan\main.py", line 219, in <module>
    for i, data in enumerate(dataloader, 0):
  File "C:\Users\Windows\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 444, in __iter__
    return self._get_iterator()
  File "C:\Users\Windows\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\Windows\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in __init__
    w.start()
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\Windows\anaconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I am only able to get the program running by setting the number of workers to 0:
python main.py --dataset lsun --dataroot lsun --cuda --ngpu 2 --workers 0

I'm running on Windows 11. Please let me know if there are any other relevant specs I should provide.

FWIW distributed support for PyTorch on Windows isn't great, would suggest dual booting linux or getting a cloud instane

How about a colab instance? I would like to run Style Transfer there, is there one already available?

Thank you very much in advance, as this would make it much easier to use.

AFAIK Colab only provides a single GPU - so fundamentally for any sort of distributed training either you need to provision a multi GPU machine from your favorite cloud provider or build one yourself. Unfortunately there's no low barrier to entry

@steenblikrs I'd like to help you but I can't test it now as I don't want to downlaod 42 Gigs of Data. However this error looks very familiar and in my experience is just a multiprocessing + dataloader issue.

The way you'd fix this is just put everything into a def main(): and then a

if __name__ == "__main__": 
     main()

So now when multiple processes are created (forked) it won't execute all the code again and you won't run into this issue.

https://stackoverflow.com/questions/64654838/pytorch-tutorial-freeze-support-issue

There are many threads on the internet regarding this issue , just try to look for them and you'll find them. But my guess is when you put everything into the main syntax it will be fine.