microsoft / ContextualSP

Multiple paper open-source codes of the Microsoft Research Asia DKI group

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Semantic parsing in context predict sql

eche043 opened this issue · comments

Hello everyone in the Semantic Parsing in Context repository, predicted sql queries with where are never correct.
example: what is the abbreviation for Jetblue?
given as query "SELECT airlines.abbreviation FROM airlines WHERE airlines.airline = 1"
as you can see the value associated with WHERE is 1 instead of Jetblue.
it's the same for all queries with WHERE.
Is there a way to resolve this.
Thanks in advance

commented

@eche043 Hi, thanks for your question. This is because that, for SParC/CoSQL, our models do not generate executable SQL queries, where the values in WHERE are not evaluated. If you would like to predict the values, please allow me some time to evaluate if our codebase is suitable for that. Thanks!

@SivilTaram OK thanks for your answer. I'm waiting

commented

@eche043 Sorry for the late response (I'm in the vacation these days). After a review on the current code, I must say that it is non-trivial to change the code of semantic_parsing_in_context to support the prediction of table values in clauses such as WEHERE. It is mainly because our parser is based on SemQL instead of the original SQL (proposed in https://arxiv.org/abs/1905.08205), which does not consider table values in the representation. If you would like to make the code work for executable SQL prediction, you may manually change the grammar definition here.

However, it is usually hard to finish in weeks. Therefore, for the current situation, I would recommend you to modify the code of UniSAR(here) to make it work for SParC. Although the current codebase is for Spider, IMO you could directly concatenate previous utterances with the current one to have a strong performance on context-dependent semantic parsing such as SParC. Maybe @DreamerDeo could help on this.

Hi @eche043,

As mentioned by @SivilTaram above, our target Spider/CoSQL/SparC benchmark don't need to predict value (as mentioned in their website), the current UniSAR predict no-value SQL accordingly to simplify the output distribution (keyword, column, header and 'value' placeholders). It's not executable as well, but you could make it executable with some simple modification,

To predict the SQL with value (i.e., make SQL executable), you could directly (1) adopt the query_toks as target SQL, rather that query_toks_no_value, in both step1-preprocess and step3-evaluation; (2) insert the value into serialized schema like PICARD[1] or BRIDGE[2], so that the UniSAR could copy the value from the input.
Actually, we have predicted SQL with value in DuSQL (Chinese Multi-table text2sql benchmark), and UniSAr does really well[3].

However, the current UniSAR constrained decoding don't support value prediction (we are working on this, but it's really tricky to construct the prefix tree of value).

[1]: PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
[2]: Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing
[3]: UNISAR: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

commented

Closed since there is no more activity.

@longxudou can you please expand on that explaination for correcting the where clause predictability