RAG-based LLM using long-term memory through vector database
This repository enables the large language model to use long-term memory through a vector database (This method is called RAG (Retrieval Augmented Generation) — this is a technique that allows LLM to retrieve facts from an external database). The application is built using mistral-7b-instruct-v0.2.Q4_K_M.gguf and chromadb.
- add new memory: type
remem
before your query (add your query to vector db) - query memory: type
mem
before your query (query most relevant memory from db) - web search: type
web
before your query (search in google)
You > Hi
LOG: [Response]
Bot < Hello! How can I assist you today?
You > web who is Pavel Durov
LOG: [Searching]
Bot < According to the search results provided, Pavel Durov is a Russian entrepreneur who co-founded Telegram Messenger Inc. He was also involved in developing The Open Network (TON), but later withdrew from the project due to litigation with the US Securities and Exchange Commission (SEC).
You > mem who is Rustam Akimov
LOG: [Querying memory]
Bot < According to the input memories, your name is Rustam Akimov.
- Install requirements.txt
- Download mistral-7b-instruct-v0.2.Q4_K_M.gguf (Note: you can use other models)
- Get Google API key and Search Engine ID
- Specify variables in .env
- Choose GPT4All or LLAMA_cpp_python bindings in chat.py
- Run chat.py