turboderp / exui

Web UI for ExLlamaV2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Installation fails in Windows 11

cleverestx opened this issue · comments

image

I was able to clone, but installing from requirements fails. How do I get around this?

For reference I'm using a RTX 4090 system with i9-13900k CPU, 96GB of RAM, so any optimizations suggestions also appreciated.

You can install PyTorch separately, from here. CUDA toolkit can be installed from here.

To run the JIT version of ExLlamaV2 you'll also need the Visual Studio (or VS build tools) installed, but alternatively you can get a prebuilt wheel from here.

As for optimization, you'll probably want flash-attn-2 installed, though it can be a bit tricky on Windows. There are people who've got it working. Then, if you're on the latest NVIDIA driver keep an eye on your VRAM usage and see if it seems suspiciously high (as in, higher than 24 GB) since then the driver may have started swapping VRAM to system RAM which slows everything to a crawl. There should be an option for disabling that behavior in the latest driver, or you could downgrade to 531.x or lower.

"You can install PyTorch separately, from here. CUDA toolkit can be installed from here."

Thanks, but I already have both of these installed, because I use Automatic1111 (and SD.NEXT) and OOBE for AI text generation (all of which require these)...at least I know OOBE is using 12.1 CUDA and my other image generation software is using Pytorch...so I'm not sure why it's acting like I don't have these installed.

I guess I'll wait until flash-attn-2 is more friendly to install in Windows...not good enough at that...

Are you sure you have torch installed? According to the screenshot you don't, or at least it's in an isolated venv somewhere. It's also possible you have an older version (probably 2.0.1) and simply need to upgrade.

As for Flash Attention, it's entirely optional. It helps on long contexts, but it's not a massive difference most of the time.

so... it needs to install another copy of pytorch and everything else, in addition to Auto1111, Oobabooga & etc?

It needs a copy of PyTorch. Oobabooga etc. install themselves into virtual environments to keep everything because they have a mountain of dependencies that all need to be very specific versions. Which also means that all those dependencies aren't available to other applications.

This pretty much just needs the latest PyTorch and ExLlamaV2.

Hmm ok, I installed the requirements, now it fails to start with

No CUDA runtime is found, using CUDA_HOME='C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA\v11.3'
Traceback (most recent call last):
...
import exllamav2_ext
ImportError: DLL load failed while importing exllamav2_ext: The specified module could not be found.

Does it really need the CUDA toolkit installed? Which version?
Where do I get exllamav2_ext?

exllamav2_ext is a component of the exllamav2 package. It's a PyTorch extension that gets built and loaded when you import exllamav2, which requires CUDA to be installed. You can also install the extension as its own package with python setup.py install --user`.

Alternatively, there are prebuilt wheels here that contain both exllamav2 and exllamav2_ext pre-compiled for various CUDA and Python versions. These should work without the CUDA toolkit installed. It does look like there were some changes made to PyTorch between versions 2.0.1 and 2.1.0, so you'll probably need PyTorch>=2.1.0 since they were pre-built on that version.

This is where I installed exllamav2 from, and it gave me all the above error messages.

It seems to me that pip install -r requirements.txt installed a CPU-only version of PyTorch. I've reinstalled it using command line from https://pytorch.org/get-started/locally/, and now it actually works.

Oh. Well, sadly there's no good way to deal with PyTorch as a requirement. I guess I should make a note in the readme at least, cause I think the default version if you don't have it installed already is always going to be what you got. Which won't work, of course.