[Feature Request] Support for GGUF models (llama.cpp compatible)
syddharth opened this issue · comments
These run on both GPU and CPU. A lot of OSS community uses them I guess, and the models are quite light on VRAM.
Thank you for submitting.
If you're using txtai 6.2+ you can do the following.
# Embeddings index
writable: false
cloud:
provider: huggingface-hub
container: neuml/txtai-wikipedia
# llama.cpp pipeline
llama_cpp.Llama:
model_path: path to GGUF file
# Extractor pipeline
extractor:
path: llama_cpp.Llama
output: reference
txtchat.pipeline.wikisearch.Wikisearch:
# Add application reference
application:
workflow:
wikisearch:
tasks:
- action: txtchat.pipeline.wikisearch.Wikisearch
You just need to make sure you also have https://github.com/abetlen/llama-cpp-python installed.
Thanks for this. The GGUF model loads correctly. Though I am getting the following error now:
Traceback (most recent call last):
File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\__main__.py", line 21, in <module>
agent = AgentFactory.create(sys.argv[1])
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\factory.py", line 34, in create
return RocketChat(config)
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\rocketchat.py", line 30, in __init__
super().__init__(config)
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\base.py", line 32, in __init__
self.application = Application(config)
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 72, in __init__
self.pipes()
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 129, in pipes
self.pipelines[pipeline] = PipelineFactory.create(config, pipeline)
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\pipeline\factory.py", line 55, in create
return pipeline if isinstance(pipeline, types.FunctionType) else pipeline(**config)
File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\pipeline\wikisearch.py", line 32, in __init__
self.workflow = Workflow([Question(action=application.pipelines["extractor"]), WikiAnswer()])
KeyError: 'extractor'
Did you run the exact configuration provided above?
Just added a fix with #13 that should fix the KeyError
message you're receiving above.
If you install txtai from source, there is now direct support for llama.cpp models. See this article for more.