JSQLParser / JSqlParser

JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern

Home Page:https://github.com/JSQLParser/JSqlParser/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

should bracket quotation (like sqlserver supports it) be removed to be able to support array constructs

wumpz opened this issue · comments

At the moment JSqlParser supports quotation like [colname]. This is the reason it cannot support simple array constructs using brackets as well.

As the bracket quotation, IMHO, should be replaced by double quotes.

Should the bracket quotation be removed to get the way free to arrays?

So no opinion from JSqlParsers users so far.

I tend to do extactly this, removing bracket quotation for version 2.0 of JSqlParser, to do array processing.

since we have seen arrays but not this quotation, I would vote for yes

So be it. Starting with version 2 JSqlParser drops support for bracket quotation.

There is a huge history of SQL that has continued to run against SQL Server for decades that uses the square bracket-quoted identifiers. If I recall correctly, some SQL Server syntax can only be accessed via the square brackets (i.e. FOR XML directives).

Does someone have an example of array syntax with an explanation of how support for that precludes support for square bracket-quoted identifiers?

Some would be (from Postgresqls manual (http://www.postgresqltutorial.com/postgresql-array/) ):

SELECT
 id,
 name,
 phones [ 2 ]
FROM
 contacts
WHERE
 id = 3;

SqlServer would interpret [2] as an alias, which is perfectly fine. Nevertheless array access using brackets is more or less standardized as well within most programming languages.

IMO it certainly would be a shame if "drops support for bracket quotation" meant that v2.x would be completely unable to parse T-SQL like

SELECT [column name] FROM [table name];

As mentioned in a previous comment, like it or not there is an enormous amount of T-SQL out there that uses [column name] instead of "column name", despite what ANSI (or others) might say.

I will happily reopen this issue. I as well was surprised, that so little seemed to be interested. I want a discussion. I want suggestions how to implement it. I would keep both, if it works, but I assume is collides to much:

select a [b]

is using sqlserver an alias and using e.g. postgres an array access.

If it is a question of dropping the currently-supported [bracket quoting] in order to support array constructs then one possibility might be to add the option to specify something like tsqlQuoting when creating a parser. If tsqlQuoting is true then [thing] would be a table/column reference, but if it is false then we may be able to safely assume that [thing] is an array reference (because a quoted table/column name would presumably be the ANSI-quoted "thing").

One can achieve this by semantic lookahead?

One can achieve this by semantic lookahead?

Maybe. Automatically detecting [column_name] vs. [array_reference] would certainly be nice if that is practical given the variety of SQL dialects out there. In other words, is it reasonable to expect that we could define parsing rules (possibly including lookahead) to reliably differentiate between the two cases for an arbitrary SQL statement without prior knowledge of the dialect?

Providing an explicit hint would definitely make it easier for the parser, in any case.

In the past, I tried various variations to tackle this problem. I could not get a proper solution. Any help would be appreciated.
One problem is, that an array construct could hold any type of expression.

A solution would be to switch off and on a defined token. But I could not achieve this.

Version 2 is released with bracket support. Skipped removal due to hopefully ongoing discussion.

how to parser 'begin' statement?

@kiover could it be that you wanted to open a new issue?

@wumpz It would be nice if you put SQL extensions behind feature flags.
e.g.

CCJSqlParserUtil.builder()
  .withTSQLBracketNotation()
  .withOracleHints()
  .withPostgresArraySyntax()
  .withPostgresSetCommands()
  .build()

(ofc the above would fail because of conflicting options though)
or even just specify a syntax to parse:

new CCJSqlParserUtil(Syntax.POSTGRES)

Again the same problem. How to modify behaviour of tokenizing by flags. Productions you can disable, but tokens? How would you do that using JavaCC? The brackets are part of a token.

Can you use a templated tokeniser? It would require compiling multiple tokenisers but will give you the flexibility to do the new CCJSqlParserUtil(Syntax.POSTGRES) solution.

@SerialVelocity What do you mean by a templated tokenizer? Does JavaCC support something like this? I know JavaCC token modes.

Sorry, I phrased that comment badly. I meant you could use a templating language to generate your JavaCC input files. That way, you can choose one of the parsers based on user/runtime input.

I don't know anything about JavaCC or how you would do this in Maven, but in Gradle you could add a task that runs before JavaCC that takes a file like src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt.tmpl and generates src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.generic.jjt and src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.t-sql.jjt.

commented

@SerialVelocity would it be more feasible to have provider specific implementations or jsqlparser and the common api for parsing in different modules?

jsqlparser - the api that loads a parser implementation
jsqlparser-tsql - specific to tsql

So

Statement stmt = CCJSqlParserUtil.parse("SELECT * FROM tab1");

Would still be the public api but would return a TSQLStatement or MySQLStatement based on what parser impl you have on the classpath?

Since the majority of the code is identical, I think having it all in one module would be better. Most of the parsers are nearly identical and I would expect them to all extend a base class with a common set of SQL. (You will have to be careful with things like "Oracle hints" since a generic statement visitor should probably visit them)

The idea of automatically deciding on a parser impl, depending on what is on the classpath, is magic that isn't needed. It will cause issues in the future where people may want to parse multiple different SQL languages. (e.g. When converting between them).

I think one of the following would work better:

CCJSqlPostgresParserUtil.parse("SELECT * FROM tab1");
CCJSqlParserUtil.parsePostgres("SELECT * FROM tab1");
new CCJPostgresSqlParser().parse("SELECT * FROM tab1");

The reason why I think that is that all of the above can return a concrete type (i.e. PostgresStatement) rather than just a Statement.

This could also be used for things like #791 where limit is only a reserved word for certain SQL languages (https://www.petefreitag.com/tools/sql_reserved_words_checker/?word=limit)

@GitHub4ZhangJX That is unrelated as it uses parentheses, not brackets. Please file a separate issue.

@GitHub4ZhangJX Please file a new issue. Your problem / issue does not belong to this discussion.

@wumpz Did you come to a conclusion about what you think the best way of implementing this would be?

I think at the moment, the fallback token way of javacc/javacc#90 is the way to go. With this I could switch on / off token useage.

The implementation has started. I now learned how to implement this token fallback method. It seems to work as expected.

The bracket quotation will still be in place, but has to be enabled by some parser switch. I think the array functionality should be the default behaviour.

commented

I think the array functionality should be the default behaviour.

+1 , provided we have a note on the wiki/readme around this :)

Hi all (I mean whoever is interested in this sort of thing...). There is a specific reason why this sort of problem is basIcally intractable in legacy JavaCC. I explain it here

That is a blog post of mine from nearly 5 months ago. A very big problem is that legacy JavaCC provides no way to "massage" the token stream in a CommonTokenAction (what I call a "token hook") method because it's called in the wrong spot. But this has been fixed.

Now, in terms of supporting mutiple grammars (I know you reject that, and I can understand why, but if you finally did decide to go with that sort of solution, JavaCC21 now has a preprocessor.. And that's pretty robust. I'm using it increasingly in internal development. I mean to say, I can understand why you dislike the idea of maintaining multiple grammars/parsers, but the thing is that it's not such an unmaintainable mess if you have both an INCLUDE directive combined with the aforementioned preprocessor. You can do things like:

   #if standard_syntax
        INCLUDE "standard.javacc"
   #elif alternative_syntax1
        INCLUDE "alternative1.javacc"
    #else
         INCLUDE "other_alternative.javacc"
     #endif

This is the preprocessor that is standard in C# actually. So, with these features, maintaining multiple grammars that are slightly different one from the other (though perhaps still slightly evil) is now basically tractable....

But, you know, generally speaking, every single major problem in legacy JavaCC has been addressed in JavaCC21. In fact, the last two really big issues were the lack of support for supplementary unicode characters and that really annoying code too large problem. and I nailed those recently.

So, I very honestly believe that every last major issue in legacy JavaCC has been fixed. In my work, I mean, which is the only actively developed version of JavaCC.... Oh, and come to think of it, you might be interested in this.

Oh, that, and the fact that you are not limited to JDK7 level Java in your code actions. https://javacc.com/2021/03/21/announcement-jdk-16-support/

Well, anyway, there's the shameless plug (or multi-plug) but the whole situation is kind of dumbfounding really. Just look through the various links. I think that every problem you have with that crufty old legacy JavaCC is now solved. Do you want to pretend that all this work I did simply doesn't exist or something?

@SerialVelocity would it be more feasible to have provider specific implementations or jsqlparser and the common api for parsing in different modules?

jsqlparser - the api that loads a parser implementation
jsqlparser-tsql - specific to tsql

So

Statement stmt = CCJSqlParserUtil.parse("SELECT * FROM tab1");

Would still be the public api but would return a TSQLStatement or MySQLStatement based on what parser impl you have on the classpath?

The problem is that if you were try to do this with the legacy JavaCC tool, it would quickly turn into an unmaintainable mess, because it lacks key features that allow you to do this in a maintainable way. The key features in question are a preprocessor and an INCLUDE directive.

(I basically said the same thing in another message already, but that could have gotten lost in the large amount of information, so....)

Just for your information, this whole issue has been killed dead now in JavaCC21. Just look at this blog post