mpt30b not showing any response.
heisenbergwasuncertain opened this issue · comments
I'm having the same problem. Processing goes to 100% for a few seconds but returns empty answers. It goes around 24Gb of RAM usage.
I tested in VScode and in cmd. Same behaviour.
Ive tried to debug, but the "generator" variable had no kind of string text inside it.
I'm running mpt-30b-chat.ggmlv0.q5_1.bin model instead of default q4_0.
PC: Ryzen 5900X and 32 Gb RAM.
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:
docker run -it -w /transformers --mount type=volume,source=transformers,target=/transformers python:3.11.4 /bin/bash
Clone the repo:
git clone git@github.com:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup.
I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:
docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone git@github.com:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you.
Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.
Great work by the way!
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone git@github.com:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.
Great work by the way!
Likely has to do with ctransformers library, since that is how the bindings work from python -> ggml (though I'm not certain of it)
I have observed that when processing user queries, the CPU usage increases but I do not receive a response.
[user]: What is the capital of France?
[assistant]:
[user]"
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone git@github.com:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.
Great work by the way!
python3 inference.py
Fetching 1 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3584.88it/s]
GGML_ASSERT: /home/runner/work/ctransformers/ctransformers/models/ggml/ggml.c:4103: ctx->mem_buffer != NULL
Aborted
Issue Fixed Replace files with
https://github.com/mzubair31102/llama2.git
I'm also facing this issue on Windows.
However, the main problem is when I run this on a container it produces very slow responses.