Embedding server crashes when used with langchain openai embeddings
voorhs opened this issue · comments
Алексеев Илья commented
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
If the bug concerns the server, please try to reproduce it first using the server test scenario framework.
The snippet:
from langchain_openai import OpenAIEmbeddings
embedding=OpenAIEmbeddings(model="-", api_key="sk-no-key-required", base_url="http://localhost:8666")
embedding.embed_documents(['hello there'])
Logs from server:
{"tid":"140695133081600","timestamp":1715435598,"level":"INFO","function":"update_slots","line":1807,"msg":"all slots are idle"}
{"tid":"140695133081600","timestamp":1715435604,"level":"INFO","function":"launch_slot_with_task","line":1036,"msg":"slot is processing task","id_slot":0,"id_task":0}
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::type_error'
what(): [json.exception.type_error.302] type must be number, but is array
After that, server stops.
Server is launched in server-cuda
container:
docker run --gpus all -v ./llm-gguf:/models -p 8666:8000 -e "CUDA_VISIBLE_DEVICES=2" local/llama.cpp:server-cuda -m /models/GritLM-7B-Q4_K_M.gguf --port 8000 --host 0.0.0.0 --n-gpu-layers 32 --embeddings
When used with openai python api, everything is good:
import openai
client = openai.OpenAI(
base_url="http://localhost:8666",
api_key = "sk-no-key-required"
)
client.embeddings.create(input=['hello mister'], model='-').data[0].embedding
System: Ubuntu 20.04.6 LTS, NVIDIA A100-40GB
stygmate commented
I have exactly the same bug. On a mac studio m1 max (last os). I'm using Hermes 2 Pro llama 8B model.