nomic-ai / gpt4all

GPT4All: Chat with Local LLMs on Any Device

Home Page:https://gpt4all.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

v2.8.0 crashes and disappears when using CUDA (incompatible PTX)

dsjlee opened this issue · comments

Bug Report

GPT4All crashes and disappears when using CUDA.

Steps to Reproduce

  1. Go to Application General Settings.
  2. Choose CUDA: [your GPU name] in Device dropdown.
  3. Load model and submit prompt in chat window.

Expected Behavior

Generate response by using GPU and show response text in chat window.

Your Environment

  • GPT4All version: v.2.8.0
  • Operating System: Windows 11
  • Chat model used: Llama3 Instruct, Mistral Instruct, Phi-3 Mini Instruct
  • GPU: Nvidia RTX A1000 6GB VRAM with driver R550 U5 (552.22) WHQL
  • CUDA Toolkit: v12.5, v12.4, v11.8 (made no difference)

Discord discussion shows other users also reporting the crash when using CUDA. I can see model is loaded into GPU's VRAM but it crashes and disappears nonetheless after submitting prompt.

Same here
Win10, rtx 4060

Could both of you confirm what model of CPU you have? Not important, actually.

I was able to reproduce this issue - it's related to OOM. If OOM happens early enough, we handle it correctly:

ggml_backend_cuda_buffer_type_alloc_buffer: allocating 256.00 MiB on device 1: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate CUDA1 buffer of size 268435456
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
LLAMA ERROR: failed to init context for model /mnt/nobackup/text-ai-models/gpt4all/Meta-Llama-3-8B-Instruct.Q4_0.gguf

But if it happens a little later (smaller margin), we crash:

ggml_backend_sched_alloc_splits: failed to allocate graph, reserving
CUDA error: out of memory
  current device: 1, in function alloc at /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-cuda.cu:312
  cuMemCreate(&handle, reserve_size, &prop, 0)
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-cuda.cu:62: !"CUDA error"

I was able to reproduce this issue - it's related to OOM. If OOM happens early enough, we handle it correctly:

Regarding OOM, cause of my issue may be different. All the model I mentioned runs fine in Vulkan mode. Phi-3 Instruct for example, occupies 3GB out of 6GB VRAM. I can see GP4All is loading the model into VRAM in CUDA mode, but it just crashes when prompt is submitted. Is there error log that GPT4All is saving somewhere?

CPU crashes too

It crashes on CPU too. I have MacBook Air 2017 and it worked fine on CPU, but after a recent update it crashes on long prompts and clears clipboard (copied text) from the RAM. I suspect it stopped using swap memory and crashes when RAM runs out.

How to reproduce:

  • Select CPU only
  • Choose model Llama 3 Instruct or Mistral Instruct
  • Make a long prompt in 2-3 sentences (23 words or 150 characters and more).
  • Run

Laptop specifications:

  • RAM: 8 GB
  • CPU: 1,8 GHz Dual-Core Intel Core i5
  • Graphics: Intel HD Graphics 6000 1536 MB

Is it possible to roll back an update without uninstall and install GPT4ALL?

I have MacBook Air 2017

This issue is specifically related to an out-of-memory condition on NVIDIA graphics cards. Since you do not have an NVIDIA graphics card, this is not your issue - please open a new one.

Is it possible to roll back an update without uninstall and install GPT4ALL?

You can install v2.7.5 from here but it has to be installed to a clean directory - there is no one-step rollback.

It'd be best if you kept the latest version around so there's a better chance we can find the issue and fix it :P

Is there error log that GPT4All is saving somewhere?

If it's hitting a GGML_ASSERT then something is at least logged to stderr - but the Windows version of GPT4All has no console unless you build it from source with this line commented out.

I'm going to fix the known crash first (which on the surface has the exact same symptoms), and if it still crashes for you then we can try and diagnose your exact issue.

@dsjlee You can try the linked PR and see if it fixes your issue. I'm building an offline installer for it now, when it's done you will see it under the artifacts tab here.

Another way to see if your issue is the same as the crash I found is to reduce the GPU Layers setting to 1. If you no longer see a crash, then the issue is certainly OOM related.

Another way to see if your issue is the same as the crash I found is to reduce the GPU Layers setting to 1. If you no longer see a crash, then the issue is certainly OOM related.

I set GPU layer to 1 and I saw GPU VRAM utilization of 0.4GB out of 6GB. It crashed nonetheless. I'm not looking for resolution of this issue as I have several other ways of running LLMs on my machines. Feel free to close this issue. I only opened it because there were other people reporting it on discord just after v2.8.0 release.