jenspapenhagen / local-rag

local dolphin-2.5-mixtral-8x7b model llama-index RAG with qdrant

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

local-rag

loading the model dolphin-2.5-mixtral-8x7b as gguf format local into ollama and buildup a small RAG (Retrieval Augmented Generation) with llama-index and as DB we use qdrant for the up-to-date personal-data.

Warning

this is only a collection of python/bash scripts, no guarantee.

Steps to a personal RAG

  • install ollama/qdrant (as docker is fine, too)

  • download the model from the huggingface page

  • run buildLocalModel.sh for creating a compl. model form the gguf file for ollama

  • edit/run localrunnerWithoutIndex.py for creating an index in qdrant and fill this with your personal-data (json format are best way to go)

  • later use localrunnerWithExistingIndex.py

  • testing this terminal output

  • run startHttpEndpoint.sh to start a small Flask HTTP Endpoint

  • go to http://127.0.0.1:5000/process_form for playing around

  • test it with cURL

    curl -X POST http://127.0.0.1:5000/process_form -F 'query="What does the author think about Star Trek?"'
  • edit httpEndpoint.py for later customizations

What is retrieval-augmented generation?

RAG is an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs' generative process.

About

local dolphin-2.5-mixtral-8x7b model llama-index RAG with qdrant


Languages

Language:Python 88.6%Language:Shell 11.4%