v2.8.0 crashes and disappears when using CUDA (incompatible PTX)

Question

v2.8.0 crashes and disappears when using CUDA (incompatible PTX)

dsjlee opened this issue a month ago · comments

Damon Lee commented a month ago

Bug Report

GPT4All crashes and disappears when using CUDA.

Steps to Reproduce

Go to Application General Settings.
Choose CUDA: [your GPU name] in Device dropdown.
Load model and submit prompt in chat window.

Expected Behavior

Generate response by using GPU and show response text in chat window.

Your Environment

GPT4All version: v.2.8.0
Operating System: Windows 11
Chat model used: Llama3 Instruct, Mistral Instruct, Phi-3 Mini Instruct
GPU: Nvidia RTX A1000 6GB VRAM with driver R550 U5 (552.22) WHQL
CUDA Toolkit: v12.5, v12.4, v11.8 (made no difference)

Discord discussion shows other users also reporting the crash when using CUDA. I can see model is loaded into GPU's VRAM but it crashes and disappears nonetheless after submitting prompt.

PedzacyKapec · Answer 1 · Tue May 28 2024 15:33:40 GMT+0800 (China Standard Time)

Same here
Win10, rtx 4060

Jared Van Bortel · Answer 2 · Wed May 29 2024 03:00:56 GMT+0800 (China Standard Time)

~~Could both of you confirm what model of CPU you have?~~ Not important, actually.

Jared Van Bortel · Answer 3 · Thu May 30 2024 06:54:22 GMT+0800 (China Standard Time)

I was able to reproduce this issue - it's related to OOM. If OOM happens early enough, we handle it correctly:

ggml_backend_cuda_buffer_type_alloc_buffer: allocating 256.00 MiB on device 1: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate CUDA1 buffer of size 268435456
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
LLAMA ERROR: failed to init context for model /mnt/nobackup/text-ai-models/gpt4all/Meta-Llama-3-8B-Instruct.Q4_0.gguf

But if it happens a little later (smaller margin), we crash:

ggml_backend_sched_alloc_splits: failed to allocate graph, reserving
CUDA error: out of memory
  current device: 1, in function alloc at /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-cuda.cu:312
  cuMemCreate(&handle, reserve_size, &prop, 0)
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-cuda.cu:62: !"CUDA error"

Damon Lee · Answer 4 · Thu May 30 2024 09:01:09 GMT+0800 (China Standard Time)

I was able to reproduce this issue - it's related to OOM. If OOM happens early enough, we handle it correctly:

Regarding OOM, cause of my issue may be different. All the model I mentioned runs fine in Vulkan mode. Phi-3 Instruct for example, occupies 3GB out of 6GB VRAM. I can see GP4All is loading the model into VRAM in CUDA mode, but it just crashes when prompt is submitted. Is there error log that GPT4All is saving somewhere?

Alex-work-1 · Answer 5 · Thu May 30 2024 21:12:32 GMT+0800 (China Standard Time)

CPU crashes too

It crashes on CPU too. I have MacBook Air 2017 and it worked fine on CPU, but after a recent update it crashes on long prompts and clears clipboard (copied text) from the RAM. I suspect it stopped using swap memory and crashes when RAM runs out.

How to reproduce:

Select CPU only
Choose model Llama 3 Instruct or Mistral Instruct
Make a long prompt in 2-3 sentences (23 words or 150 characters and more).
Run

Laptop specifications:

RAM: 8 GB
CPU: 1,8 GHz Dual-Core Intel Core i5
Graphics: Intel HD Graphics 6000 1536 MB

Is it possible to roll back an update without uninstall and install GPT4ALL?

Jared Van Bortel · Answer 6 · Thu May 30 2024 22:55:12 GMT+0800 (China Standard Time)

I have MacBook Air 2017

This issue is specifically related to an out-of-memory condition on NVIDIA graphics cards. Since you do not have an NVIDIA graphics card, this is not your issue - please open a new one.

Is it possible to roll back an update without uninstall and install GPT4ALL?

You can install v2.7.5 from here but it has to be installed to a clean directory - there is no one-step rollback.

It'd be best if you kept the latest version around so there's a better chance we can find the issue and fix it :P

Jared Van Bortel · Answer 7 · Thu May 30 2024 23:04:53 GMT+0800 (China Standard Time)

Is there error log that GPT4All is saving somewhere?

If it's hitting a GGML_ASSERT then something is at least logged to stderr - but the Windows version of GPT4All has no console unless you build it from source with this line commented out.

I'm going to fix the known crash first (which on the surface has the exact same symptoms), and if it still crashes for you then we can try and diagnose your exact issue.

Jared Van Bortel · Answer 8 · Fri May 31 2024 06:23:38 GMT+0800 (China Standard Time)

@dsjlee You can try the linked PR and see if it fixes your issue. I'm building an offline installer for it now, when it's done you will see it under the artifacts tab here.

Another way to see if your issue is the same as the crash I found is to reduce the GPU Layers setting to 1. If you no longer see a crash, then the issue is certainly OOM related.

Damon Lee · Answer 9 · Fri May 31 2024 08:20:07 GMT+0800 (China Standard Time)

Another way to see if your issue is the same as the crash I found is to reduce the GPU Layers setting to 1. If you no longer see a crash, then the issue is certainly OOM related.

I set GPU layer to 1 and I saw GPU VRAM utilization of 0.4GB out of 6GB. It crashed nonetheless. I'm not looking for resolution of this issue as I have several other ways of running LLMs on my machines. Feel free to close this issue. I only opened it because there were other people reporting it on discord just after v2.8.0 release.