Possible parser bug?
nweir127 opened this issue · comments
Hello,
I am testing out the dataset loader and am getting errors on particular training examples (not from the spider set):
self.text_to_instance(
utterance=ex['question'], # 'how many problem logs are there out of the problems with the most staff'
db_id=ex['db_id'], # tracking_software_problems
sql=query_tokens) # ['select', 'count', '(', '*', ')', 'from', 'problem_log', 'where', 'problem_log@problem_id', '=', '(', 'select', 'problems@problem_id', 'from', 'problems', 'where', 'problems@closure_authorised_by_staff_id', '=', '(', 'select', 'staff@staff_id', 'from', 'staff', 'group', 'by', 'staff@staff_id', 'order', 'by', 'count', '(', '*', ')', 'desc', 'limit', '1', ')', 'limit', '1', ')']
I received the parser error Rule 'statement' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with '= ( select problems@' (line 1, column 66).
-- does this imply that the parser doesn't accept nested queries of this form? The query does execute successfully against the database.
The parser in general accepts nested queries, but the grammar was optimized mostly for the spider dataset. So it wouldn't be surprising if the grammar will need slight adjustments to fit other examples.
You can see the grammar in this file: https://github.com/benbogin/spider-schema-gnn-global/blob/master/semparse/contexts/spider_db_grammar.py
You can see that a where_clause
is made of expressions (expr
), and that each expr
can also be a nested query (source_subq
). Somewhere, your query doesn't perfectly fit this grammar. This is a bit tricky to debug, but the best way to start with is to remove a small bit of the query (probably should start from the sub-query) until the grammar is parsed - or the opposite way - start with a simple query and add small parts each time.
When you find what makes it fail, you could add/change the grammar rules accordingly.
Hope that helps.