Running private-gpt inside docker with the same definitions as non-docker behave super slow - unusable
BenBatsir opened this issue · comments
I'm trying to dockerize private-gpt
I've been successfully able to run it locally and it works just fine on my MacBook M1.
I've been also able to dockerize it and run it inside a container as a pre-step for my next steps (deploying on different hosts), but this time when trying to get a response it hangs and finally timed-out:
private-gpt-1 | 11:51:39.191 [WARNING ] llama_index.core.chat_engine.types - Encountered exception writing response to history: timed out
I did increase docker resources such as CPU/memory/Swap up to the maximum level, but sadly it didn't solve the issue.
Any idea how can I overcome this?
To address the the Apple Silicon GPU with the Metal API is for now not possible with Docker.
But you can use CUDA Toolkit to address a NVIDIA CUDA GPU with a Docker container. See #1655 (comment)
As @dpedwards said, you can't use metal with docker. The easiest and simplest way to use on a Mac, is ollama
running on top of host and PrivateGPT
using any profile (running on top of host or docker, as you like).