Did you forgot to mention the requirement of a nVidia GPU?

Question

Did you forgot to mention the requirement of a nVidia GPU?

LtqxWYEG opened this issue 3 years ago · comments

Because I'm stuck with Could not find module 'caffe2_nvrtc.dll' (or one of its dependencies).
And that sounds to me, after some googlin, that I'd need a CUDA capable GPU.

Daniel Kukieła · Answer 1 · Sat Jun 19 2021 17:52:11 GMT+0800 (China Standard Time)

Yes, some Nvidia GPU is required to run this. We did not even try to run these big models on a CPU, but even if possible, would not be a great experience.

davidgfb · Answer 2 · Sat Jun 19 2021 20:31:27 GMT+0800 (China Standard Time)

@LtqxWYEG Just comment that workaround section in inference.py lines 22-25 with apostrophes so it looks like this:
'''
# Workaround for PyTorch issue on Windows
if os.name == 'nt':
import ctypes
ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll')
'''

Daniel Kukieła · Answer 3 · Sat Jun 19 2021 21:07:27 GMT+0800 (China Standard Time)

This is not only about these lines of code. It has never been tested on a CPU, so currently Nvidia GPU is required.
We might check if it can be run on a CPU and if the results will be good enough, we'll update the code accordingly.

Harrison · Answer 4 · Sat Jun 19 2021 21:09:41 GMT+0800 (China Standard Time)

Neither of us have access to an AMD GPU to test, but I think Torch has AMD support now and it might work there too.

on CPU only, you'd likely struggle with FPS and it'd be unpleasant I'd imagine, but as Daniel said above we can give it a shot and see if there's a nice way to let people run this on CPU-only.

This is a project of 2 people, so our ability to test things across a wide range of devices and OSes is not really possible.

Daniel Kukieła · Answer 5 · Sat Jun 19 2021 21:12:25 GMT+0800 (China Standard Time)

In addition to what Harrison just said - if you have an AMD GPU set for ML learning and you want to test it out (no modification to the code should be required for this), you're more than welcome to do so! :)

LtqxWYEG · Answer 6 · Sun Jun 20 2021 00:21:06 GMT+0800 (China Standard Time)

In addition to what Harrison just said - if you have an AMD GPU set for ML learning and you want to test it out (no modification to the code should be required for this), you're more than welcome to do so! :)

I'm not sure what you mean with "set for ML", but yes I'd like to try things. I commented the section and now I get the error:

  File "C:\Users\redacted\AppData\Local\Programs\Python\Python39\lib\site-packages\torch-1.8.0-py3.9-win-amd64.egg\torch\cuda\__init__.py", line 261, in set_device
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

Of course because there is no cuda device.

davidgfb · Answer 7 · Sun Jun 20 2021 00:50:04 GMT+0800 (China Standard Time)

The next step would be to comment line 91 from the same file with a number sign like this:
#torch.cuda.set_device(gpu)

LtqxWYEG · Answer 8 · Sun Jun 20 2021 01:18:35 GMT+0800 (China Standard Time)

The next step would be to comment line 91 from the same file with a number sign like this:
#torch.cuda.set_device(gpu)

Oh. I didn't think it would be this easy :D

Ok, but this one is not that simple, right?

File "C:\Users\Distelzombie\AppData\Local\Programs\Python\Python39\lib\site-packages\torch-1.8.0-py3.9-win-amd64.egg\torch\cuda\__init__.py", line 164, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

afaiR, I did download some cpu version of either torch or something similiar sounding. Maybe it DOES have to be the CUDA version, even if it does not sound like it would work?

Harrison · Answer 9 · Sun Jun 20 2021 02:45:54 GMT+0800 (China Standard Time)

Again as a slight warning, neither I nor Daniel are familiar with AMD GPUs for deep learning. All I know is that recently TF and Torch did officially add AMD support. This is fairly new, so when you search web engines, you might see people saying AMD GPUs aren't supported, since those articles are outdated.

AMD's "CUDA" equivalent is ROCm

So, when you go to download Torch, for example on this page: https://pytorch.org/get-started/locally/

You would select for ROCm (IF your GPU is AMD).

BEYOND that, however, I am not sure what else is required to actually set up DL on an AMD GPU, or if all AMD GPUs are supported (like with CUDA you need a CUDA capable device, which isn't all of the NVIDIA GPUs).

Also note the (beta) stipulation they're making. Who knows what further struggles await, but I encourage you to try and report back to us your findings, should you be so brave :P

LtqxWYEG · Answer 10 · Sun Jun 20 2021 06:21:13 GMT+0800 (China Standard Time)

So, when you go to download Torch, for example on this page: https://pytorch.org/get-started/locally/

"NOTE: ROCm is not available on Windows"

:(
Maybe I should try it over Jupyter? But I never used that

Rob Kimball · Answer 11 · Sun Jun 20 2021 09:04:16 GMT+0800 (China Standard Time)

I have a nVidia GPU and was running into the same error message regarding the missing caffe2 DLL using the provided installation instructions. It worked for me when I installed the correct version of PyTorch:

pip3 install torch==1.9.0+cu102 torchvision==0.10.0+cu102 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

System Specs:

Windows 10 19042.1052
i7-10700K
GTX 960

The PR above addresses this installation discrepancy to help prevent future confusion.

Daniel Kukieła · Answer 12 · Mon Jun 21 2021 03:07:00 GMT+0800 (China Standard Time)

@davidgfb Can I ask you to stop giving wrong advices? I already explained this to you, thank you.

Daniel Kukieła · Answer 13 · Mon Jun 21 2021 03:11:50 GMT+0800 (China Standard Time)

The next step would be to comment line 91 from the same file with a number sign like this:
#torch.cuda.set_device(gpu)

Oh. I didn't think it would be this easy :D

Because it's not, you cannot just comment this out, it would not magically start using your AMD GPU if you do this.

Ok, but this one is not that simple, right?

File "C:\Users\Distelzombie\AppData\Local\Programs\Python\Python39\lib\site-packages\torch-1.8.0-py3.9-win-amd64.egg\torch\cuda\__init__.py", line 164, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

It looks like then that GameGAN is made specifically to use CUDA features and to made it ROCm compatible we'll need someone with an AMD GPU and knowledge. We do not own AMD GPUs to make it work.

afaiR, I did download some cpu version of either torch or something similiar sounding. Maybe it DOES have to be the CUDA version, even if it does not sound like it would work?

The error message suggests that original GameGAN devs put this check and error message there for a purpose. Maybe it could be changed, but we don't know for now.

LtqxWYEG · Answer 14 · Mon Jun 21 2021 18:01:11 GMT+0800 (China Standard Time)

Oh. Well that does make sense that commenting it out wouldn't make it start using my AMD GPU, but I expected it to use the CPU.
Anyway, I don't have the knowledge to make it work. I remember that I tried to get some other GAN to work on my PC, (They also said it should work on most or almost all PCs) and that would've only worked via Jupyter, maybe.
I just used a Google Colab notebook with a nVidia GPU for that GAN, but that wouldn't work here, I guess, since it isn't just a simple script-like program

Then I give up :)
Please consider rephrasing "It should work on almost every PC" with the next project/video you're doing. You deep learning guys always seem to forget about AMD users. Why is that? :)

Daniel Kukieła · Answer 15 · Mon Jun 21 2021 21:18:01 GMT+0800 (China Standard Time)

We've heard of the broken CUDA installs which probably fell back to a CPU and were like a frame every few seconds.

AMD has never been a thing with machine learning, that's why. There have been unofficial attempts, but even now PyTorch has a beta AMD GPU support (go here https://pytorch.org/ and see how it says beta next to ROCm). This is a very recent thing. And it's nothing about us, ml guys, it's about AMD to join the party :)

LtqxWYEG · Answer 16 · Sat Jun 26 2021 23:14:08 GMT+0800 (China Standard Time)

Well it would probably be easier for AMD if nVidia wouldn't make everything closed source, proprietary software. Also CUDA-powered GPUs also support programming frameworks such as OpenMP, OpenACC and OpenCL. AMDs GPUs do support OpenCL. So its not just AMDs fault that nobody wants to use their hardware.

AMD makes their software open source, but oh fancy nVidia is too earnings-oriented to be fair :(

Daniel Kukieła · Answer 17 · Sat Jun 26 2021 23:22:45 GMT+0800 (China Standard Time)

First, this is not a place to talk about closed-source and open-source drivers, nor about Nvidia vs AMD. Not everything from AMD is open-sourced. What you are referring to is probably drivers and how that affects Linux (or lack of ability to modify them). But like I said, not a topic to be discussed here.

Second, Nvidia has nothing to do with this situation. ML frameworks are open-sourced, it's a lack of API in AMD drivers and/or hardware support in their cards so these ML frameworks can use them. It changes just lately, but there's still a lot of work that has to be done. AMD won't use Nvidia's drivers, will it?
As for this project - it requires CUDA, but it might be mostly as a check if a GPU and drivers are present. Since there's no alternative, this is how it works. It's also open-sourced. PyTorch has beta support for ROCm, so if anyone with capable AMD GPU and knowledge wants to modify this project to work on AMD GPUs, they can. Nothing here is closed-sourced.

LtqxWYEG · Answer 18 · Sun Jun 27 2021 18:17:59 GMT+0800 (China Standard Time)

Nvidia has nothing to do with this situation

eeh

Nothing here is closed-sourced.

Of course. I meant the CUDA technology that AMD could adopt if nVidia would let them. (Much like Physx, Hairworks, G-sync etc etc CUDA is prop. nVidia tech that only works with their GPUs - contrary to that AMDs FreeSync, FidelityFX etc pp is all open source that works with nVidia GPUs as well)

Anyway, yes this isn't the place to discuss this. I just wanted to vent my frustration with nVidia :)

Thank you!
Kind regards :)

Daniel Kukieła · Answer 19 · Mon Jun 28 2021 00:33:54 GMT+0800 (China Standard Time)

Just to make things clear speaking of CUDA - it's closely tied with CUDA cores, which AMD does not have and would not have, and drivers, which AMD also cannot use, Nvidia and AMD cards are not using the same architecture as, for example, AMD and Intel CPUs, which both are based on x86_64.
Also, I'm not trying to be in opposition to you, or just "on the other side", just sharing facts.
Cheers. :)

LtqxWYEG · Answer 20 · Mon Jun 28 2021 01:46:19 GMT+0800 (China Standard Time)

Well, you know. If nVidia would let them, other GPU manufacturers could include CUDA cores much like every card has SPUs, FPUs, ROPs, TMUs, Emus, Mops und blops
That's why, currently (shrug), AMD is more consumer friendly and less of a capitalist hell-lord. Also have you seen AMDs driver GUI? It's like nVidia is still in the 90s when it comes to UI/UX/mindest :P

Also, let me tell you about Microsoft! ... xD