SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running NAT require Removing Visibility for Old (sm_52 or earlier) GPUs

gauenk opened this issue · comments

Thank you for the great code. Initially, I had this error:

/home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda_kernel.cu(58): error: identifier "__hfma2" is undefined
          detected during instantiation of "void natten1dav_cuda_forward_kernel_fp16<KS,NS,scalar_t>(at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, int, int, int, int, int) [with KS=5, NS=2, scalar_t=c10::Half]

After debugging, I discovered the problem. I have a newer GPU (sm_75) and an older GPU (sm_52). If an old GPU is visible, the ninja compiler will add it's "sm" to the list. Notice the "sm_52" in the text below. This version is prior to half-precision hardware.

[1/3] c++ -MMD -MF natten1dav_cuda.o.d -DTORCH_EXTENSION_NAME=natten1dav_cuda -
DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -
DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/cuda/include -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include -isystem /home/gauenk/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/gauenk/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 
-fPIC -std=c++14 -c /home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda.cpp -o natten1dav_cuda.o

The fix is easy. Just only allow your "good" GPUs (sm_75 or older) to be visible.

export CUDA_VISIBLE_DEVICES=0

I am leaving this comment in case this helps someone else.

Thank you for your interest, and for sharing this.

Thank you for the great code. Initially, I had this error:

/home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda_kernel.cu(58): error: identifier "__hfma2" is undefined
          detected during instantiation of "void natten1dav_cuda_forward_kernel_fp16<KS,NS,scalar_t>(at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, int, int, int, int, int) [with KS=5, NS=2, scalar_t=c10::Half]

After debugging, I discovered the problem. I have a newer GPU (sm_75) and an older GPU (sm_52). If an old GPU is visible, the ninja compiler will add it's "sm" to the list. Notice the "sm_52" in the text below. This version is prior to half-precision hardware.

[1/3] c++ -MMD -MF natten1dav_cuda.o.d -DTORCH_EXTENSION_NAME=natten1dav_cuda -
DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -
DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/cuda/include -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include -isystem /home/gauenk/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/gauenk/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 
-fPIC -std=c++14 -c /home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda.cpp -o natten1dav_cuda.o

The fix is easy. Just only allow your "good" GPUs (sm_75 or older) to be visible.

export CUDA_VISIBLE_DEVICES=0

I am leaving this comment in case this helps someone else.

Hello, I also have the same problem as you, as follows:

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: type name is not allowed

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: identifier "scalar_t" is undefined

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: identifier "AT_PRIVATE_CASE_TYPE" is undefined

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(273): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(271): warning: variable "block" was declared but never referenced

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: type name is not allowed

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: identifier "scalar_t" is undefined

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "attn_threads" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "attn_threads" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "attn_threads" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "attn_threads" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "attn_threads" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "attn_threads" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "block" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected an identifier

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: "stream" has already been declared in the current scope

/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu(381): error: expected a ";"

Error limit reached.
100 errors detected in the compilation of "/home/c503/TinyLIC-main/compressai/layers/natten/src/natten1dav_cuda_kernel.cu".
Compilation terminated.
ninja: build stopped: subcommand failed.


Process finished with exit code 1

Can you tell me more about your solution? I'm a little confused!

Please also include the GPU you're using along with the CUDA version when asking questions about GPU errors. They can be quite difficult to solve.

The reason the parent's (gauenk) solution worked was because they forced only one GPU to be used (export CUDA_VISIBLE_DEVICES=0) instead of both (export CUDA_VISIBLE_DEVICES=0,1). Presumably the newer one with sm_75. It is usually a bad idea to mix GPU models. CUDA is often not backwards compatible. Note that we only support SM >= 60

For your specific case, I'm guessing that you're using an older sm environment. The output only shows that functions are not being properly defined. Try to do your best to include the relevant information and code format (see the edit I made to your comment)

First of all thank you for your answer, I use two pieces of 4090, designated gpu0 or gpu1 training. cuda version is 11.3.

@guoguo1314 I really don't think this issue is related for the following reasons:
A. 4090s are SM89, we definitely support that.
B. This issue is about a year old, and was relevant when we hadn't even packaged our kernels into a pip package.

The error you shared is suggesting there may be a bigger issue with your building NATTEN from source. Can you confirm which commit you're on?

Just to leave no stone unturned, I just tried NATTEN on 4090s:

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
> nvidia-smi
Fri May 19 16:49:27 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.03   Driver Version: 525.116.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:41:00.0 Off |                  Off |
|  0%   31C    P8    39W / 480W |      1MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:61:00.0 Off |                  Off |
|  0%   29C    P8    39W / 480W |      1MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
> pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
> pip3 install natten -f https://shi-labs.com/natten/wheels/cu118/torch2.0.0/index.html
Looking in links: https://shi-labs.com/natten/wheels/cu118/torch2.0.0/index.html
Collecting natten
  Downloading https://shi-labs.com/natten/wheels/cu118/torch2.0.0/natten-0.14.6%2Btorch200cu118-cp310-cp310-linux_x86_64.whl (74.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.1/74.1 MB 20.9 MB/s eta 0:00:00
Collecting packaging
  Downloading packaging-23.1-py3-none-any.whl (48 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 7.9 MB/s eta 0:00:00
Installing collected packages: packaging, natten
Successfully installed natten-0.14.6+torch200cu118 packaging-23.1
> make test
Running unit tests
python -m unittest discover -v -s ./tests
test_cpu_cuda_allclose (test_na1d.NA1DTest) ... ok
test_natten1dav_gradcheck_cpu_fast (test_na1d.NA1DTest) ... ok
test_natten1dav_gradcheck_cpu_slow (test_na1d.NA1DTest) ... ok
test_natten1dav_gradcheck_cuda_fast (test_na1d.NA1DTest) ... ok
test_natten1dav_gradcheck_cuda_slow (test_na1d.NA1DTest) ... ok
test_natten1dqk_gradcheck_cpu_fast (test_na1d.NA1DTest) ... ok
test_natten1dqk_gradcheck_cpu_slow (test_na1d.NA1DTest) ... ok
test_natten1dqk_gradcheck_cuda_fast (test_na1d.NA1DTest) ... ok
test_natten1dqk_gradcheck_cuda_slow (test_na1d.NA1DTest) ... ok
test_natten1dqkrpb_gradcheck_cpu_fast (test_na1d.NA1DTest) ... ok
test_natten1dqkrpb_gradcheck_cpu_slow (test_na1d.NA1DTest) ... ok
test_natten1dqkrpb_gradcheck_cuda_fast (test_na1d.NA1DTest) ... ok
test_natten1dqkrpb_gradcheck_cuda_slow (test_na1d.NA1DTest) ... ok
test_cpu_cuda_allclose (test_na2d.NA2DTest) ... ok
test_natten2dav_gradcheck_cpu_fast (test_na2d.NA2DTest) ... ok
test_natten2dav_gradcheck_cpu_slow (test_na2d.NA2DTest) ... ok
test_natten2dav_gradcheck_cuda_fast (test_na2d.NA2DTest) ... ok
test_natten2dav_gradcheck_cuda_slow (test_na2d.NA2DTest) ... ok
test_natten2dav_tiled11x11_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dav_tiled13x13_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dav_tiled3x3_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dav_tiled5x5_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dav_tiled7x7_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dav_tiled9x9_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dqk_gradcheck_cpu_fast (test_na2d.NA2DTest) ... ok
test_natten2dqk_gradcheck_cpu_slow (test_na2d.NA2DTest) ... ok
test_natten2dqk_gradcheck_cuda_fast (test_na2d.NA2DTest) ... ok
test_natten2dqk_gradcheck_cuda_slow (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_gradcheck_cpu_fast (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_gradcheck_cpu_slow (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_gradcheck_cuda_fast (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_gradcheck_cuda_slow (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_tiled11x11_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_tiled13x13_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_tiled3x3_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_tiled5x5_gradcheck_cuda (test_na2d.NA2DTest) ... ok
test_natten2dqkrpb_tiled7x7_gradcheck_cuda (test_na2d.NA2DTest) ... ok

First of all thank you for your answer, I use two pieces of 4090, designated gpu0 or gpu1 training. cuda version is 11.3.

I'm pretty sure this is either incorrect -- or you shouldn't be on CUDA 11.3. As far as I know 11.7 is the earliest version that supports SM89 (i.e. 4090.)

First of all thank you for your answer, I use two pieces of 4090, designated gpu0 or gpu1 training. cuda version is 11.3.

I'm pretty sure this is either incorrect -- or you shouldn't be on CUDA 11.3. As far as I know 11.7 is the earliest version that supports SM89 (i.e. 4090.)

Thank you. That's very kind of you, alihassanijr. I'm going to try that. That's very kind of you, Say it again.