[Feature Request] Support for GGUF models (llama.cpp compatible)

Question

[Feature Request] Support for GGUF models (llama.cpp compatible)

syddharth opened this issue 8 months ago · comments

syddharth commented 8 months ago

These run on both GPU and CPU. A lot of OSS community uses them I guess, and the models are quite light on VRAM.

David Mezzetti · Answer 1 · Sat Nov 18 2023 01:57:51 GMT+0800 (China Standard Time)

Thank you for submitting.

If you're using txtai 6.2+ you can do the following.

# Embeddings index
writable: false
cloud:
  provider: huggingface-hub
  container: neuml/txtai-wikipedia

# llama.cpp pipeline
llama_cpp.Llama:
    model_path: path to GGUF file

# Extractor pipeline
extractor:
  path: llama_cpp.Llama
  output: reference

txtchat.pipeline.wikisearch.Wikisearch:
  # Add application reference
  application:

workflow:
  wikisearch:
    tasks:
    - action: txtchat.pipeline.wikisearch.Wikisearch

You just need to make sure you also have https://github.com/abetlen/llama-cpp-python installed.

syddharth · Answer 2 · Sat Nov 18 2023 19:10:43 GMT+0800 (China Standard Time)

Thanks for this. The GGUF model loads correctly. Though I am getting the following error now:

Traceback (most recent call last):
  File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\__main__.py", line 21, in <module>
    agent = AgentFactory.create(sys.argv[1])
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\factory.py", line 34, in create
    return RocketChat(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\rocketchat.py", line 30, in __init__
    super().__init__(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\base.py", line 32, in __init__
    self.application = Application(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 72, in __init__
    self.pipes()
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 129, in pipes
    self.pipelines[pipeline] = PipelineFactory.create(config, pipeline)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\pipeline\factory.py", line 55, in create
    return pipeline if isinstance(pipeline, types.FunctionType) else pipeline(**config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\pipeline\wikisearch.py", line 32, in __init__
    self.workflow = Workflow([Question(action=application.pipelines["extractor"]), WikiAnswer()])
KeyError: 'extractor'

David Mezzetti · Answer 3 · Sun Nov 19 2023 23:21:07 GMT+0800 (China Standard Time)

Did you run the exact configuration provided above?

David Mezzetti · Answer 4 · Sun Dec 17 2023 18:42:41 GMT+0800 (China Standard Time)

Just added a fix with #13 that should fix the KeyError message you're receiving above.

If you install txtai from source, there is now direct support for llama.cpp models. See this article for more.

https://neuml.hashnode.dev/integrate-llm-frameworks