llama.cpp and gguf files

Question

llama.cpp and gguf files

denijane opened this issue 2 months ago · comments

I'm trying to create a flow using a locally run llama3 model. I tried using ollama to run the llama3 model, but I'm getting strange responses. (Prompt simply doesn't stop and I'm watching the AI talk to itself). Also, it is very slow, like 10 times slower than what I get from directly running ollama.

Then I decided to use directly the downloaded model with LlamaCpp to see if it wokrs better. First thing: LLM->LlamaCPP accepts only bin files and the newer format is gguf.

Then I downloaded an older model that is in bin format and I'm getting
"ValueError: Error building node LlamaCpp(ID:LlamaCpp-BzhwI): Could not import llama-cpp-python library. Please install the llama-cpp-python library to use this embedding model: pip install llama-cpp-python"

I spent the night debugging this and I'm scratching my head. llama-cpp-python import Llama, while the error I get with LlamaCpp should be imported from langchain_community, not from llama-cpp-python.

I made a test in Python and after importing LlamaCPP from langchain_community, I was able to run fine Meta-Llama-3-8B-Instruct-Q6_K.gguf but not llama-2-7b-chat.ggmlv3.q3_K_L.bin which returs error(type=value_error).

I also made a test with from llama_cpp import Llama - again it works with .gguf files but not with bin files.

So I'm not sure which library LangFlow uses, maybe it's just naming convention calling it LlamaCPP when it's calling Llama, or it's really LlamaCpp and the error message about the library is wrong, but the file format is definitely wrong and LlamaCPP simply doesn't work.

YAMON.IO · Answer 1 · Mon Apr 22 2024 09:36:22 GMT+0800 (China Standard Time)

I recommend using Ollama, which has focused community support. Recently, I have been working on improvements to this component and expect to complete it within 2-3 days, as I have verified it works correctly. I suggest you review my draft and make your own modifications to temporarily operate it. (In my draft, remove and use the "buildConfig" section due to an incorrect implementation of the buildConfig method.)

#1701

denijane · Answer 2 · Tue Apr 23 2024 00:12:16 GMT+0800 (China Standard Time)

Hi, I managed to make LlamaCpp work by editing the python code in the LllamaCpp component to allow gguf files (I also played with some source files but I don't think they did it). So it runs.

Now the problem is similar to the one with using ollama - 1) very slow (compared to just doing >ollama run llama3 and talking to it) and 2) it doesn't stop. It starts generating human responses and then replying to them and it goes on forever.

So if you fixed at least 2) in the new version, that would be a significant update.