Possible parser bug?

Question

Possible parser bug?

nweir127 opened this issue 5 years ago · comments

Hello,

I am testing out the dataset loader and am getting errors on particular training examples (not from the spider set):

self.text_to_instance(
                    utterance=ex['question'],     # 'how many problem logs are there out of the problems with the most staff'
                    db_id=ex['db_id'],  # tracking_software_problems
                    sql=query_tokens)  #  ['select', 'count', '(', '*', ')', 'from', 'problem_log', 'where', 'problem_log@problem_id', '=', '(', 'select', 'problems@problem_id', 'from', 'problems', 'where', 'problems@closure_authorised_by_staff_id', '=', '(', 'select', 'staff@staff_id', 'from', 'staff', 'group', 'by', 'staff@staff_id', 'order', 'by', 'count', '(', '*', ')', 'desc', 'limit', '1', ')', 'limit', '1', ')']

I received the parser error Rule 'statement' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with '= ( select problems@' (line 1, column 66). -- does this imply that the parser doesn't accept nested queries of this form? The query does execute successfully against the database.

Ben Bogin · Answer 1 · Tue Oct 29 2019 00:18:35 GMT+0800 (China Standard Time)

The parser in general accepts nested queries, but the grammar was optimized mostly for the spider dataset. So it wouldn't be surprising if the grammar will need slight adjustments to fit other examples.

You can see the grammar in this file: https://github.com/benbogin/spider-schema-gnn-global/blob/master/semparse/contexts/spider_db_grammar.py

You can see that a where_clause is made of expressions (expr), and that each expr can also be a nested query (source_subq). Somewhere, your query doesn't perfectly fit this grammar. This is a bit tricky to debug, but the best way to start with is to remove a small bit of the query (probably should start from the sub-query) until the grammar is parsed - or the opposite way - start with a simple query and add small parts each time.

When you find what makes it fail, you could add/change the grammar rules accordingly.
Hope that helps.