neuml / txtchat

💭 Retrieval augmented generation (RAG) and language model powered search applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request] Support for GGUF models (llama.cpp compatible)

syddharth opened this issue · comments

These run on both GPU and CPU. A lot of OSS community uses them I guess, and the models are quite light on VRAM.

Thank you for submitting.

If you're using txtai 6.2+ you can do the following.

# Embeddings index
writable: false
cloud:
  provider: huggingface-hub
  container: neuml/txtai-wikipedia

# llama.cpp pipeline
llama_cpp.Llama:
    model_path: path to GGUF file

# Extractor pipeline
extractor:
  path: llama_cpp.Llama
  output: reference

txtchat.pipeline.wikisearch.Wikisearch:
  # Add application reference
  application:

workflow:
  wikisearch:
    tasks:
    - action: txtchat.pipeline.wikisearch.Wikisearch

You just need to make sure you also have https://github.com/abetlen/llama-cpp-python installed.

Thanks for this. The GGUF model loads correctly. Though I am getting the following error now:

Traceback (most recent call last):
  File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\__main__.py", line 21, in <module>
    agent = AgentFactory.create(sys.argv[1])
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\factory.py", line 34, in create
    return RocketChat(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\rocketchat.py", line 30, in __init__
    super().__init__(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\base.py", line 32, in __init__
    self.application = Application(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 72, in __init__
    self.pipes()
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 129, in pipes
    self.pipelines[pipeline] = PipelineFactory.create(config, pipeline)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\pipeline\factory.py", line 55, in create
    return pipeline if isinstance(pipeline, types.FunctionType) else pipeline(**config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\pipeline\wikisearch.py", line 32, in __init__
    self.workflow = Workflow([Question(action=application.pipelines["extractor"]), WikiAnswer()])
KeyError: 'extractor'

Did you run the exact configuration provided above?

Just added a fix with #13 that should fix the KeyError message you're receiving above.

If you install txtai from source, there is now direct support for llama.cpp models. See this article for more.

https://neuml.hashnode.dev/integrate-llm-frameworks