vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

Home Page:https://vanna.ai/docs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vanna training stores duplicates

andreped opened this issue · comments

Describe the bug
Quite surprisingly, if I try to save the same question-sql pair twice, there will be two instances in the ChromaDB.

It would be great if Vanna had a mechanism to disallow duplicates. I doubt there is a use case where full duplicates make sense in any application.

One could allow that one question can be linked to multiple sqls, but there should not be full question-sql duplicates.

Expected behavior
Attempting to save an existing question-sql should either be ignored or replace the existing one. A warning should be raised to the user if this occurred.

This is also true for add_ddl method, at least for the built-in ChromaDB_VectorStore.
Here is a quick possible fix: #336

Fix has been merged. Will be part of upcoming release.