failure occured in building at pytorch 1.11.0 / CUDA 11.3 / Win10 / VS2019 error

Question

failure occured in building at pytorch 1.11.0 / CUDA 11.3 / Win10 / VS2019 error

helonin opened this issue 2 years ago · comments

Thanks for your great job! But i was so sad since the failure occured in building >_<
The Ninja can not generated the file ‘nattenav_cuda.obj’. Please help.
It is the error information.

Ali Hassani · Answer 1 · Tue Aug 09 2022 01:34:43 GMT+0800 (China Standard Time)

Thank you for your interest.
Could you run these and share their outputs?

python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch._C._cuda_getCompiledVersion(), torch.version.cuda)"

nvcc --version

It's basically failing to even start compiling, so it's likely either a torch or CUDA issue.
It's unlikely, but it could be ninja as well. Could you remove ninja and see if it builds?

helonin · Answer 2 · Tue Aug 09 2022 09:22:53 GMT+0800 (China Standard Time)

Thanks for your help! It is the output information.

And how can I remove the Ninja? I am a beginner on Python >_<

Steven Walton · Answer 3 · Tue Aug 09 2022 11:13:01 GMT+0800 (China Standard Time)

Something seems wrong with the paths here. I notice in the first post it says D:\natten\nattencuda.py and it is failing to find files. I suspect that something is going on here but I'm not very familiar with Windows path environments. Is this intended?

I don't think it is ninja. I'm pretty certain that this is either a CUDA issue or environment issue (probably some intersection). I'm certain given the Runtime Error in the first post. The big issue here is I don't know where Windows is caching builds. According to stylegan 3's troubleshooting guide it should be located at :\Users\<username>\AppData\Local\torch_extensions\torch_extensions\Cache, so you should clear any reference to natten there (should be safe to clear everything)

@helonin can you edit gradcheck.py, place the import torch line above the natten import (@alihassanijr we should also change this btw. Our imports should be last) and directly below print your python info? So like this

import torch
print(f"torch {torch.__version__} and cuda {torch.version.cuda}")
from nattencuda import NATTENAVFunction, NATTENQKRPBFunction

This should verify that the file sees the correct torch and cuda versions (I suspect it isn't). Let's see the output of that.

But if you want to uninstall ninja you can just do so through pip.

helonin · Answer 4 · Tue Aug 09 2022 16:39:48 GMT+0800 (China Standard Time)

I did everythin following your guide but anather error occured.

Ali Hassani · Answer 5 · Tue Aug 09 2022 23:24:21 GMT+0800 (China Standard Time)

Could you removing the cache directory that @stevenwalton mentioned (you could alternatively set your TORCH_EXTENSIONS_DIR env variable to somewhere else), remove ninja (pip uninstall ninja) and try again?

Steven Walton · Answer 6 · Wed Aug 10 2022 01:46:08 GMT+0800 (China Standard Time)

It is possible that this is a Windows issue? I'm seeing that Python 3.8 only loads DLLs from trusted locations.. @helonin , what version of Python are you using? Does this Overflow link help?

helonin · Answer 7 · Wed Aug 10 2022 09:31:24 GMT+0800 (China Standard Time)

the version of my python is 3.7.10. I have set the TORCH_EXTENSIONS_DIR env variable but occured the same problem. >_<.
I have give up trying in Windows and will try to install the NAT in Ubuntu soon. Thank you all the same！

Steven Walton · Answer 8 · Wed Aug 10 2022 11:02:09 GMT+0800 (China Standard Time)

Still looks like a environment variable issue. I think you should track down where TORCH_EXTENSIONS_DIR points to as well as where you're allowed to read files from (as per the stack overflow link).

For Ubuntu, note that TORCH_EXTENSIONS_DIR is at ~/.cache/torch_extensions. The path won't exist till you build something.

helonin · Answer 9 · Wed Aug 10 2022 21:56:45 GMT+0800 (China Standard Time)

The work finished successly in Ubuntu！
Thank you all the same！

Steven Walton · Answer 10 · Thu Aug 11 2022 08:28:51 GMT+0800 (China Standard Time)

I'll close this issue for now but feel free to open it back up. We do need to test more on Windows.