turboderp / exui

Web UI for ExLlamaV2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

build error

deeeed opened this issue · comments

Hi, I have tried to build a on a clean env and not sure what I am missing.
Running on NVIDIA GeForce RTX 4080

 python3 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt

 python server.py

Traceback (most recent call last):
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/ext.py", line 14, in <module>
    import exllamav2_ext
ModuleNotFoundError: No module named 'exllamav2_ext'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/deeeed/dev/exui/server.py", line 11, in <module>
    from backend.models import update_model, load_models, get_model_info, list_models, remove_model, load_model, unload_model, get_loaded_model
  File "/home/deeeed/dev/exui/backend/models.py", line 5, in <module>
    from exllamav2 import(
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/model.py", line 17, in <module>
    from exllamav2.cache import ExLlamaV2CacheBase
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/cache.py", line 2, in <module>
    from exllamav2.ext import exllamav2_ext as ext_c
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/ext.py", line 126, in <module>
    exllamav2_ext = load \
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load
    return _jit_compile(
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'exllamav2_ext': [1/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/h_gemm.cu -o h_gemm.cuda.o
FAILED: h_gemm.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/h_gemm.cu -o h_gemm.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[2/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/lora.cu -o lora.cuda.o
FAILED: lora.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/lora.cu -o lora.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[3/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/pack_tensor.cu -o pack_tensor.cuda.o
FAILED: pack_tensor.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/pack_tensor.cu -o pack_tensor.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[4/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/quantize.cu -o quantize.cuda.o
FAILED: quantize.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/quantize.cu -o quantize.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[5/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_matrix.cu -o q_matrix.cuda.o
FAILED: q_matrix.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_matrix.cu -o q_matrix.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[6/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_attn.cu -o q_attn.cuda.o
FAILED: q_attn.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_attn.cu -o q_attn.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[7/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_mlp.cu -o q_mlp.cuda.o
FAILED: q_mlp.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_mlp.cu -o q_mlp.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[8/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_gemm.cu -o q_gemm.cuda.o
FAILED: q_gemm.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/q_gemm.cu -o q_gemm.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[9/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/rope.cu -o rope.cuda.o
FAILED: rope.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/rope.cu -o rope.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[10/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/rms_norm.cu -o rms_norm.cuda.o
FAILED: rms_norm.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/rms_norm.cu -o rms_norm.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[11/15] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/cache.cu -o cache.cuda.o
FAILED: cache.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cuda/cache.cu -o cache.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[12/15] c++ -MMD -MF sampling.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cpp/sampling.cpp -o sampling.o
[13/15] c++ -MMD -MF quantize_func.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/cpp/quantize_func.cpp -o quantize_func.o
[14/15] c++ -MMD -MF ext.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/TH -isystem /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c /home/deeeed/dev/exui/venv/lib/python3.10/site-packages/exllamav2/exllamav2_ext/ext.cpp -o ext.o
ninja: build stopped: subcommand failed.

(venv) ➜  exui git:(master) ✗

It looks like something is misconfigured. compute_89 is the correct version for the 4080, so it looks like Torch has picked up on that, but maybe your CUDA version is too old?

You can try one of the prebuilt wheels here.

Also you can install PyTorch from here to make sure you're getting the right version for your setup.

Thanks, it seems to work with the prebuilt wheels!