Status: In Development
This project is an experimental implementation of a long-term memory system for Large Language Model (LLM) agents. It explores how to overcome the inherent limitation of finite context windows in LLMs by creating a persistent, searchable memory using a local SQLite database.
LLM agents are powerful, but they suffer from amnesia. Their memory is limited to the current conversation history (short-term memory). Once the context window is full, they forget past interactions. This project aims to solve this by providing a mechanism for the agent to store and retrieve relevant information from a long-term memory store.
The core of this project is the CompositeAgent, which orchestrates a set of specialized LLM agents to manage both short-term and long-term memory.
-
LLMAgent: A basic agent responsible for direct interaction with an LLM through an API. It can be configured with different models and system prompts. -
CompositeAgent: The main orchestrator. It manages the overall chat flow and memory operations. It contains three specialized agents:- Expert Agent: The main "personality" of the agent that chats with the user.
- Tagger Agent: An agent whose sole purpose is to generate descriptive tags from a piece of text.
- Interpreter Agent: An agent that synthesizes and summarizes information retrieved from the long-term memory.
-
Short-Term Memory: A simple list of the most recent interactions in the conversation history.
-
Long-Term Memory: An SQLite database (
<agent_name>_db.sqlite) that stores information in a structured way.- Content Table: Stores chunks of text.
- Tags Table: Stores unique tags.
- Content-Tags Table: Links content with tags, creating a many-to-many relationship.
- User Input: The user sends a message to the
CompositeAgent. - Context Tagging: The
Tagger Agentanalyzes the conversation history (short-term memory) and generates a set of tags that represent the current context. - Long-Term Memory Retrieval: The
CompositeAgentuses these tags to search the SQLite database for relevant content. The search query prioritizes content that matches more tags and has been retrieved more often (based on a "points" system). - Information Interpretation: The retrieved content is passed to the
Interpreter Agent, which creates a concise summary. - Expert Response: The
Expert Agentreceives the user's original prompt and the summary from theInterpreter Agent. This summary acts as a "memory" that informs the expert's response. - Memory Storage:
- The user's prompt and the agent's final response are individually sent to the
Tagger Agentto generate new tags. - Both the prompt and the response are saved as new entries in the long-term memory, associated with their respective tags and the context tags.
- The user's prompt and the agent's final response are individually sent to the
-
Prerequisites:
- Python 3
- An Ollama server (or any other LLM server with a compatible API) running and accessible from your network.
- The
requestslibrary. You can install it with:pip install requests
-
Configuration:
- Open
main.py. - Modify the
CompositeAgentparameters:agent_name: A name for your agent (this will also be the name of the database file).server_ip: The IP address and port of your LLM server.expert_model,tagger_model,interpreter_model: The names of the models you want to use for each agent.system_prompt: The system prompt for theExpert Agent.
- Open
-
Running the Agent:
python main.py
This project is designed to be modular and extensible. You can:
- Swap Models: Easily change the models for each specialized agent to experiment with different LLMs for different tasks (e.g., a small, fast model for tagging and a powerful model for the expert).
- Modify Prompts: Change the system prompts in
composite_agent.pyto alter the behavior of the tagger and interpreter. - Extend the Database Schema: Add more metadata to the
contenttable, such as timestamps or relationship types. - Improve Retrieval Logic: Modify the
get_related_content_by_tagsmethod to experiment with different information retrieval algorithms.
-
Forget: A point system is currently in place, where each "memory" increases by one point each time it is returned. The plan is to have a defined limit so that when a "memory" exceeds a maximum time limit and a minimum number of points, it is forgotten.
-
Notion of time: Add a notion of time to the agents.
-
New Version dos agentes
- tagger: recibe las tags actuales del contexto + un mensaje a la vez y actualiza las tags de contexto
- expert: recibe las interacciones del contexto y genera una respuesta
Flujo de informacion:
- se recibe el mensaje del usuario
- se le pasan las tags de contexto + el mensaje del usuario y el agente actualiza las tags
- se consultan en la bd las tags pertinentes y se devuelven las interacciones con fecha (nombre del mes, numero del dia y el dia en palabras) y hora
- se le pasan las interacciones de contexto al experto junto con el mensaje del usuario y se genera la respuesta
- el tagger toma las tags de contexto + la respuesta del agente y genera nuevas tags
- nuevo ciclo iniciado