vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

Home Page:https://vanna.ai/docs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in SQL Syntax using MSSQL server (previously reported)

dhait opened this issue · comments

Copied from #310
THIS WAS PREVIOUSLY REPORTED AND CLOSED WITHOUT A FIX

Describe the bug
I'm using Vanna connected to a sample Microsoft SQL server and I'm having some trouble with the SQL generated by Vanna when asking for X top Customers or Y top songs or Z top sales. The generated SQL always uses function LIMIT which is correct when using SQLite or MySQL, but it should use SELECT TOP instead when using SQL server. This causes an error, of course. I tried setting SQL example queries using the correct format in the training data but even when I ask one of the questions that I've set as an example it always tries to use LIMIT instead of SELECT TOP. I believe it is incompatible with MSSQL. Any ideas?

To Reproduce
Steps to reproduce the behavior:
vn.train(
question="Who are the top 10 customers?",
sql="SELECT top 10 C.CUSTOMERID, C.FirstName, C.LastName, SUM(I.TOTAL) FROM CUSTOMER C JOIN INVOICE I ON C.CUSTOMERID = I.CUSTOMERID GROUP BY C.CUSTOMERID, C.FirstName, C.LastName ORDER BY SUM(I.TOTAL) DESC")

vn.generate_sql(quest="Who are the top 10 customers?")

answer

Expected behavior
SELECT top 10 C.CUSTOMERID, C.FirstName, C.LastName, SUM(I.TOTAL) FROM CUSTOMER C JOIN INVOICE I ON C.CUSTOMERID = I.CUSTOMERID GROUP BY C.CUSTOMERID, C.FirstName, C.LastName ORDER BY SUM(I.TOTAL) DESC

Actual behaviour
SELECT t.Name AS TrackName, SUM(il.Quantity) AS TotalQuantitySold FROM Track t INNER JOIN InvoiceLine il ON t.TrackId = il.TrackId GROUP BY t.Name ORDER BY TotalQuantitySold DESC LIMIT 10

Error / LOGS
Couldn't run sql: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near 'LIMIT'. (102) (SQLExecDirectW)")
[SQL: SELECT t.Name AS TrackName, SUM(il.Quantity) AS TotalQuantitySold
FROM Track t
INNER JOIN InvoiceLine il ON t.TrackId = il.TrackId
GROUP BY t.Name
ORDER BY TotalQuantitySold DESC
LIMIT 10]
(Background on this error at: https://sqlalche.me/e/20/f405)

Desktop (please complete the following information where):

OS: Windows
Version: 11
Python: [3.11.8]
Vanna: [2.8.0]
Additional context
Add any other context about the problem here.

As far as we've been able to determine, this appears to happen when you're using Chinook as a sample dataset. We believe that this is happening because the foundational models have a bunch of pre-training relating to Chinook that are tuned to other SQL flavors so the pre-training is effectively overriding the in-context learning.

If you're doing a demo for SQL Server, the recommendation is to use AdventureWorks: https://learn.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver16&tabs=ssms