Unable to find a valid cuDNN algorithm to run convolution

Question

Unable to find a valid cuDNN algorithm to run convolution

anderslanglands opened this issue 2 years ago · comments

I'm getting the following error when trying to run:

texturize remix examples/dirt1.webp --size=256x256 --device=cuda

I installed as per instructions in the README. I'm on a pretty fresh Ubuntu 20.04, with NVIDIA driver 470 and an A6000. I don't have a CUDA toolkit installed aside from the one installed by conda.

I've also tried this on Ubuntu 18.04 under WSL2g and native Ubuntu 22.04. In some cases I get the traceback, in others it just sits there for a very long time (it may be that the trace would have printed eventually I just got tired of waiting).

Any pointers you can give me would be appreciated.

Traceback (most recent call last):
  File "/home/anders/miniconda3/envs/texturize/bin/texturize", line 8, in <module>
    sys.exit(main())
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/__main__.py", line 176, in main
    result, filenames = api.process_single_command(cmd, log, **config)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/api.py", line 99, in process_single_command
    for result in process_octaves(cmd, log=log, **config):
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in generator_context
    x = next(gen)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/api.py", line 89, in process_octaves
    for r in process_iterations(cmd, **kwargs):
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in generator_context
    x = next(gen)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/api.py", line 68, in process_iterations
    for result in app.process_octave(
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/app.py", line 130, in process_octave
    for iteration, (loss, result_img, lr, retries) in enumerate(
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/app.py", line 40, in run
    yield from self._run(progress, seed_img, *args, objective_class=oc, solver_class=sc)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/app.py", line 65, in _run
    for i, loss, converge, lr, retries in self._iterate(opt):
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/app.py", line 85, in _iterate
    loss, scores = opt.step()
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/solvers.py", line 84, in step
    self.optimizer.step(self.call_objective)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/optim/lbfgs.py", line 311, in step
    orig_loss = closure()
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/solvers.py", line 51, in call_objective
    loss, scores = self.objective(self.image)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/texturize/solvers.py", line 170, in __call__
    loss.backward()
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/autograd/__init__.py", line 98, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution (try_all at /opt/conda/conda-bld/pytorch_1591914858187/work/aten/src/ATen/native/cudnn/Conv.cpp:693)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f091fd99b5e in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xd5d68d (0x7f0920f5068d in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xd5e1d1 (0x7f0920f511d1 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0xd6220b (0x7f0920f5520b in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #4: at::native::cudnn_convolution_backward_input(c10::ArrayRef<long>, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool) + 0xb2 (0x7f0920f55762 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0xdc9280 (0x7f0920fbc280 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe0db18 (0x7f0921000b18 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #7: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, std::array<bool, 2ul>) + 0x4fa (0x7f0920f56dfa in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0xdc95ab (0x7f0920fbc5ab in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #9: <unknown function> + 0xe0db74 (0x7f0921000b74 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #10: <unknown function> + 0x29dee26 (0x7f094db1be26 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x2a2e634 (0x7f094db6b634 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x378 (0x7f094d733ff8 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #13: <unknown function> + 0x2ae7df5 (0x7f094dc24df5 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f094dc220f3 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7f094dc22ed2 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f094dc1b549 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f0951165b08 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #18: <unknown function> + 0xdbbf4 (0x7f09539b8bf4 in /home/anders/miniconda3/envs/texturize/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #19: <unknown function> + 0x8609 (0x7f096d121609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #20: clone + 0x43 (0x7f096d046133 in /lib/x86_64-linux-gnu/libc.so.6)

Anders Langlands · Answer 1 · Wed Jun 15 2022 12:29:49 GMT+0800 (China Standard Time)

device=cpu works but is obviously quite slow

Hannu Töyrylä · Answer 2 · Wed Jun 15 2022 12:48:27 GMT+0800 (China Standard Time)

Cudnn can be disabled while still using CUDA by adding

torch.backends.cudnn.enabled = False

to the code.

Anders Langlands · Answer 3 · Wed Jun 15 2022 12:59:35 GMT+0800 (China Standard Time)

Thanks for your reply! Where would I put that... just somewhere at the top of main.py? EDIT: sticking it just after the torch import seems to have worked. Thanks!

And if I do want to use cuDNN what do I do in that case? Just install CUDA toolkit and have it in my LD_LIBRARY_PATH?

Hannu Töyrylä · Answer 4 · Wed Jun 15 2022 14:34:02 GMT+0800 (China Standard Time)

The cudatoolkit installed by conda should be all you need, even for cudnn. Perhaps a different CUDA version might help. But already disabling cudnn should take you a long way (I remember having had similar problems sometimes).

Anders Langlands · Answer 5 · Wed Jun 15 2022 16:02:43 GMT+0800 (China Standard Time)

Thanks for your help!