prestodb / sql

A Modern SQL frontend based on SQL16 with extensions for streaming, graph, rich types, etc, including parser, resolver, rewriters, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unparseable statements

manticore-projects opened this issue · comments

Greetings!

First of all: I am impressed by this! So much to learn and gain from it. Thank you for sharing.
I come from JSQLParser, which I joined only because I wanted to write a Java SQL Formatter. All of that brought me to JavaCC eventually.

I am only an Accountant with very limited technical skills, so please bear with me.

Your grammar is so much cleaner than JSQLParser. JSQLParser has grown over 20 years by adding patch on patch. In the result JSQLParser suffers heavily, when we try to parse deeply nested expressions (as shown in the examples below).

Unfortunately, SQL is very nasty in that regard because it does not enforce clear encapsulation of logical blocks within brackets. Think about a nested CASE ... WHEN ... ELSE ... END expression.

From that perspective, I went straight to your code and stress tested it with some of our finest standard compliant test cases -- and they all failed :-D

On top of that, we would need to talk about Oracle vs. Standard and T-SQL vs. the Standard of course. And keywords!

Long story short: Would you be interested in a much larger Test Suite derived from JSQLParser?
And would you be interested in working with us on hardening the Grammar for the dirty SQL observed in real life? You have much superior understanding of JavaCC. I can provide Test, practical SQL experience, Documentation, Website, Railroad diagrams etc.

I would be really happy to re-invent JSQLParser based on your grammar, although it will be a lot of work. Please do let me know what you think.

All the best, thank you and cheers
Andreas

--

SELECT CASE WHEN ( CASE WHEN ( CASE WHEN ( CASE WHEN ( 1 ) THEN 0 END ) THEN 0 END ) THEN 0 END ) THEN 0 END FROM a
SELECT CASE
                WHEN wdgfld.porttype = 1
                    THEN 'INPUT PORT'
                ELSE CASE
                    WHEN wdgfld.porttype = 1
                        THEN 'INPUT PORT'
                    ELSE CASE
                        WHEN wdgfld.porttype = 1
                            THEN 'INPUT PORT'
                        ELSE CASE
                            WHEN wdgfld.porttype = 1
                                THEN 'INPUT PORT'
                            ELSE CASE
                                WHEN wdgfld.porttype = 1
                                    THEN 'INPUT PORT'
                                ELSE CASE
                                    WHEN wdgfld.porttype = 1
                                        THEN 'INPUT PORT'
                                    ELSE CASE
                                        WHEN wdgfld.porttype = 1
                                            THEN 'INPUT PORT'
                                        ELSE CASE
                                            WHEN wdgfld.porttype = 1
                                                THEN 'INPUT PORT'
                                            ELSE CASE
                                                WHEN wdgfld.porttype = 1
                                                    THEN 'INPUT PORT'
                                                ELSE CASE
                                                    WHEN wdgfld.porttype = 1
                                                        THEN 'INPUT PORT'
                                                    ELSE CASE
                                                        WHEN wdgfld.porttype = 1
                                                            THEN 'INPUT PORT'
                                                        ELSE CASE
                                                            WHEN wdgfld.porttype = 1
                                                                THEN 'INPUT PORT'
                                                            ELSE CASE
                                                                WHEN wdgfld.porttype = 1
                                                                    THEN 'INPUT PORT'
                                                                ELSE CASE
                                                                    WHEN wdgfld.porttype = 1
                                                                        THEN 'INPUT PORT'
                                                                    ELSE '0'
                                                                END
                                                            END
                                                        END
                                                    END
                                                END
                                            END
                                        END
                                    END
                                END
                            END
                        END
                    END
                END
            END columnalias
FROM table1
;
SELECT ((((((((((((((((tblA)))))))))))))))) FROM mytable
SELECT Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( Round( 0, 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 ), 0 )
;

Greetings!

Thanks for prompt feedback.
Although I am confused: I added those statements to TEST_SQL_TESTSTRINGS in TestSqlParser.java and then both Tests smokeTest and parseUnparseTest failed. I think I did the right thing because some other of my statements succeeded.

JavaCC is 7.0.10 as pulled by Maven. I would try 7.0.12 and maybe you new 8 branch.
The Tests throws errors as shown below, but only when I add the statements shown above. Other samples have worked very well (which really impressed me).

[ERROR] Tests run: 9, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.625 s <<< FAILURE! - in TestSuite
[ERROR] parseUnparseTest(com.facebook.coresql.parser.TestSqlParser)  Time elapsed: 0.128 s  <<< FAILURE!
java.lang.AssertionError: expected object to not be null
	at org.testng.Assert.fail(Assert.java:94)
	at org.testng.Assert.assertNotNull(Assert.java:406)
	at org.testng.Assert.assertNotNull(Assert.java:391)
	at com.facebook.coresql.parser.TestSqlParser.parseUnparseTest(TestSqlParser.java:146)

[ERROR] smokeTest(com.facebook.coresql.parser.TestSqlParser)  Time elapsed: 0.058 s  <<< FAILURE!
java.lang.AssertionError: expected object to not be null
	at org.testng.Assert.fail(Assert.java:94)
	at org.testng.Assert.assertNotNull(Assert.java:406)
	at org.testng.Assert.assertNotNull(Assert.java:391)
	at com.facebook.coresql.parser.TestSqlParser.smokeTest(TestSqlParser.java:136)

#29

Has your test cases

As for tsql, oracle etc., I had an interesting way of doing it in my previous startup Metanautix. Most of it is definitely doable if you can get their parser - which I actually had at another startup even earlier (Fortify Software circa 2007). We actually got Oracle SQL parser under license from them and wrote security analyzers for Oracle SQL. Same with TSQL as well. But unfortunately, they are not open source and also MSFT and ORCL do not release those grammars.

Thanks!

I will try again and add more tests in the similar way.
Also I will try to understand what I did wrong.

On TSQL/Oracle, its very doable by amending the grammar manually. Let me experiment a bit more, it seems to be a very worthwhile investment!

Thanks a lot!

When you want to amend the grammar, see if you can actually do it outside in a separate fragment with a call-out just like I did for presto extensions. That way we can keep the whole thing in tact and move it in sync.