pytorch 1.12.0 CUDA 11.6 Win10 VS2019 build error

Question

pytorch 1.12.0 CUDA 11.6 Win10 VS2019 build error

Ken1256 opened this issue 2 years ago · comments

C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

2 errors detected in the compilation of "C:/pytorch/NAT/natten/src/nattenav_cuda_kernel.cu".
nattenav_cuda_kernel.cu
ninja: build stopped: subcommand failed.

Ali Hassani · Answer 1 · Wed Jul 06 2022 23:18:16 GMT+0800 (China Standard Time)

Hello and thank you for your interest.
We recommend using PyTorch 1.11.
1.12 is a very recent release and will likely require us updating the kernel.
However, the error you shared does not appear to be from our code.
Have you tried 1.11?

Ken1256 · Answer 2 · Thu Jul 07 2022 00:18:14 GMT+0800 (China Standard Time)

Similar problem.
facebookresearch/pytorch3d#1127
Maybe need a specific Windows version.

Ali Hassani · Answer 3 · Thu Jul 07 2022 00:29:34 GMT+0800 (China Standard Time)

I seriously doubt that, because as I mentioned the error points to pybind, not to our code. Unless that's not the full error.
But again, I'd recommend using 1.11, we still haven't even tested our kernel on 1.12.

Ali Hassani · Answer 4 · Thu Jul 07 2022 00:36:26 GMT+0800 (China Standard Time)

Edit: It appears to be an incompatibility issue with nvcc. I've seen multiple instances of this in other PyTorch CUDA extensions, maybe they might help?

ashawkey/torch-ngp#51 (comment)

facebookresearch/pytorch3d#1024

bamsumit/slayerPytorch#86

Ken1256 · Answer 5 · Fri Jul 08 2022 22:00:15 GMT+0800 (China Standard Time)

Win10 VS2019 pytorch 1.11.0 CUDA 11.3 pass
Win10 VS2019 pytorch 1.12.0 CUDA 11.3 pass
Win10 VS2019 pytorch 1.12.0 CUDA 11.6 fail

pytorch/pytorch#69460

Ali Hassani · Answer 6 · Fri Jul 08 2022 23:44:10 GMT+0800 (China Standard Time)

Are those your cuda toolkit versions or cuda driver versions?
Assuming it's the latter, so just using 1.12 with an earlier toolkit resolved the issue?

Ken1256 · Answer 7 · Sat Jul 09 2022 09:46:05 GMT+0800 (China Standard Time)

Win10 21H2 19044.1706
VS2019 16.11.16(Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30145 for x64)
GPU Drive Version 512.59

pytorch 1.11.0 CUDA 11.3, cuda_11.3.1_465.89_win10 pass
pip install https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp37-cp37m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu113/torchvision-0.12.0%2Bcu113-cp37-cp37m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu113/torchaudio-0.11.0%2Bcu113-cp37-cp37m-win_amd64.whl

pytorch 1.12.0 CUDA 11.3, cuda_11.3.1_465.89_win10 pass
pip install https://download.pytorch.org/whl/cu113/torch-1.12.0%2Bcu113-cp37-cp37m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu113/torchvision-0.13.0%2Bcu113-cp37-cp37m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu113/torchaudio-0.12.0%2Bcu113-cp37-cp37m-win_amd64.whl

pytorch 1.12.0 CUDA 11.6, cuda_11.6.0_511.23_windows fail
pip install https://download.pytorch.org/whl/cu116/torch-1.12.0%2Bcu116-cp37-cp37m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu116/torchvision-0.13.0%2Bcu116-cp37-cp37m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu116/torchaudio-0.12.0%2Bcu116-cp37-cp37m-win_amd64.whl

Ali Hassani · Answer 8 · Sat Jul 09 2022 13:05:08 GMT+0800 (China Standard Time)

So what is your actual cuda version though?
Also, it's unclear, is the kernel it working with the 11.3 toolkit?

Ken1256 · Answer 9 · Sat Jul 09 2022 13:27:44 GMT+0800 (China Standard Time)

Yes.
I download from here:
https://developer.nvidia.com/cuda-11-3-1-download-archive

Ali Hassani · Answer 10 · Sun Jul 10 2022 04:34:58 GMT+0800 (China Standard Time)

So is the issue resolved?

Ken1256 · Answer 11 · Mon Jul 11 2022 19:28:24 GMT+0800 (China Standard Time)

HAT v0.11 issue is resolved.
HAT v0.12 There are other build errors.

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(881): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(911): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(911): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1214): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1214): error: expected an expression

9 errors detected in the compilation of "Z:/py_test/NAT_v0_12/natten/src/nattenav_cuda_kernel.cu".
nattenav_cuda_kernel.cu
ninja: build stopped: subcommand failed.

Ali Hassani · Answer 12 · Mon Jul 11 2022 23:30:30 GMT+0800 (China Standard Time)

Are you still on PyTorch v1.12 or 1.11?

Ken1256 · Answer 13 · Tue Jul 12 2022 00:13:30 GMT+0800 (China Standard Time)

On PyTorch v1.11.

Ali Hassani · Answer 14 · Tue Jul 12 2022 00:22:34 GMT+0800 (China Standard Time)

Can you clear your compilation cache and try again? I just tried a fresh compile and it works out fine on multiple set ups on my end.
I'm not sure where the cache would be on Windows, on linux it's $HOME/.cache/torch_extensions.
Could you also confirm you're on the latest commit?

Ken1256 · Answer 15 · Thu Jul 14 2022 23:45:15 GMT+0800 (China Standard Time)

After clearing the cache still build errors.
Did you tested on Windows?
HAT v0.12 uses much less memory than HAT v0.11?

Ali Hassani · Answer 16 · Thu Jul 14 2022 23:58:10 GMT+0800 (China Standard Time)

I'm sorry to hear that.
Unfortunately no, we don't have a Windows environment, but the error is really strange.
Based on the error you shared it's possible that the issue is: it's not loading a header file, which is new in v0.12.
But from what I'm seeing it's probably an incompatibility somewhere in your environment (CUDA vs CUDA toolkit vs PyTorch version), that's resulting in the compilation error -- but again can't really say for certain with the information I have.

And no -- our NA extension just generally uses less memory than SWSA (I can get into details if you want), the memory usage hasn't changed in the new version. But our models will run a lot faster now with the new version basically.

lightfield botanist · Answer 17 · Thu Aug 04 2022 20:59:31 GMT+0800 (China Standard Time)

PyTorch 1.11 should work?

Steven Walton · Answer 18 · Sat Aug 06 2022 05:04:18 GMT+0800 (China Standard Time)

Yes. 1.11 is the recommended version.

Ali Hassani · Answer 19 · Fri Sep 30 2022 09:10:07 GMT+0800 (China Standard Time)

Closing this due to inactivity. If you still have questions feel free to open it back up.