JSQLParser / JSqlParser

JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern

Home Page:https://github.com/JSQLParser/JSqlParser/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

API changes: extend visitor with return values

wumpz opened this issue · comments

If you try to build a CopyVisitor of some kind of object hierarchy, one has to build some kind of Stack structure, to give the actual visitor a hint, where it should write its copied data.

I suggest to extend all visitor methods with a return value:

MyType visitor(MyType param);

The standard implementation should return the actual object inserted.

public interface StatementVisitor {
    Comment visit(Comment comment);
    Commit visit(Commit commit);
    Delete visit(Delete delete);
    Update visit(Update update);

Using this we would be able to create some kind of functional visitor implementation.

There is a thumbs up above, but I am more interested in textual comments :).

commented

I'm having a bit of a hard time sort of visualizing this, so correct me if I wrong.

Let's assume we just had a SelectVisitor to copy the Select.

Old Way:

class OldSelect implements SelectVisitor {

private Select copy = new Select();

// super naive and bad just illustrating examples
Select copy(Select original) {

  visit(original);

return copy;
}

  @Override
    public void visit(Select select) {
      copy.setBody(deepClone(select.getSelectBody());

        // potentially the with items would be
      copy.setWithItems(deepClone(select.getWithItems()));
    }
}

With new pattern:

class NewCopySelect implements SelectVisitor {


// super naive and bad just illustrating examples
Select copy(Select original) {
  return visit(original);
}

  @Override
    public Select visit(Select select) {
   // and other copying stuff
     Select copy = new Select();
      copy.setBody(deepClone(select.getSelectBody());

    // potentially the with items would be
    copy.setWithItems(visit(select.getWithItems());
      return copy;
    }
}

Or am I Missing something.

Correct. With this new style the visitors itself return the copy and the caller could immediately put it in the right place. So you don't need to organise your copied items in some sort of stack to find the last generated. Imho a global variable here should be avoided but cannot within the actual implementation.

commented

extend all visitor methods with a return value

Would it be valuable to have CopyStatementVisitor and the current StatementVisitor ? If not, then changing StatementVisitor would be fine, probably just a big warning to all consumers that things no longer return void.

Indeed this would be a massive api change. That is why we would go for version 4.

If there are two visitor interfaces then we would need to visit methods per class. So that is IMHO a no go.

commented

IIUC, the suggested change is that all the visit methods of the visitors would be extended to return a copy of the object they receive as the argument to help with implementing the copy visitor. Is that correct?

If it is then, is my understanding that this is useful only to those who want a deep copy of an object, correct? I'm just wondering whether this huge API change is worth the migration burden on the users who want to move to version 4 of the library? That's because they would have to change every extension of every visitor they would have implemented as a consequence of this change. That sounds like a lot of work and might not be a good thing to bring upon the users. Just my opinion.

@sivaraam: that is correct. As well this change does not support replacement by different types, e.g. Column to literal or CaseExpression or you name it. That is why I am getting away from this. The problem here to solve IMHO the backreference to the parent object. How to replace an expression, while there is no reference to the parent? Now you have to adapt every possible parent. To make things worse you not only need the reference to the parent but to the attribute as well.

As well this change does not support replacement by different types, e.g. Column to literal or CaseExpression or you name it. That is why I am getting away from this.

Just for the sake of discussion, is it not possible to overcome this by making the visit methods return the appropriate super-classes? For example, in the case you mentioned we could just make the visit method corresponding to the Column of the appropriate visitor to return a generic Expression object and the implementation of the visit method could think about what type of Expression should be returned. I think this would work for any given visitor, and the objects it is visiting. Of course, this would possibly make it difficult to comprehend why the visit methods return a different type. But otherwise, does this sound like plausible solution?

That said, I still believe that modifying the existing visitor methods would place a huge migration burden on the users. To overcome this I think we can re-consider the following which was suggested before:

Would it be valuable to have CopyStatementVisitor and the current StatementVisitor ?

That would of course double the amount of visitor classes but at least it would not introduce any migration burden on the existing users. Also, having separate visitor classes would ensure that we don't add any additional (unrelated) responsibility to the visit methods. IOW, it would ensure SRP.

So how to overcome this problems of the visitor pattern? Any ideas?

I agree with @sivaraam but would go one step further and make the extra visitor generic:

public <T> T accept(Generic*Visitor<T> visitor);

This would:

  • Allow CopyVisitor to be written. Each visitor would be allowed to have its own type so you can have implements GenericStatementVisitor<Statement>, GenericExpressionVisitor<Expression>, etc.
  • Allow custom types to be returned too. e.g. the TableVisitor could have T set to List<TableName> (this has the added benefit of easily making the class thread-safe because you no longer have a field lying around).
commented

So how to overcome this problems of the visitor pattern? Any ideas?

Would it be possible to invert things for consumption sake. Right now we have

interface SimpleVIsitor {

 void  visitFoo(Foo foo);
}

And the proposed change is:

interface SimpleVisitor {
Foo visitFoo(Foo foo);
}

What if we introduced the CopyVisitor as a lower level stack, so we could have:

interface CopyableVisitor {

 Comment copy(Comment comment);
}

interface SelectVisitor  extends CopyableVisitor {
default void visit(Comment comment) { copy(comment); }
}

Since this is java 8, we could use the default feature to make this at least easy for consumers?

@AnEmortalKid That would be a possibility. However, the difficulties are still there. Let us stick with our example of copying the hierarchy, then die proposed changes would do their job via something like

Expression visit(MyExpression expr) {
     MyExpression my = new MyExpression();
     my.setLeftExpression(visit(expr.getLeftExpression()));
     return my;
}

Unfortunately, this misses one level of indirection from the visitor pattern, it should be

    my.setLeftExpression( expr.getLeftExpression().accept(this) );   

Not only the visit methods but the accept methods have to be changed as well.

If we want to create a different type of visitor, like replace all columns with numbers. A really stupid one, but you get the idea of replacing stuff. Then you have to make the return type "nearly everywhere" an Expression.

Additionally, a big problem with this actual visitor stuff is the missing context. If the visit Column method would know, where it came from (BinaryExpression, left) we could take advantage of this as well and so on and so far.

At the moment I think every visitor - API change would go in the wrong direction, because every task needs other specialities. So my thoughts are going in a different direction right now, by building on an actual visitor some kind of context via reflection or so, to overcome most of possible infrastructure problems to solve this issues.

commented

So my thoughts are going in a different direction right now, by building on an actual visitor some kind of context via reflection or so, to overcome most of possible infrastructure problems to solve this issues.

Ah, so something we could push down information on like:

this.setContext(TABLE.X)
column.accept(this);

What sort of things would we need in the context?

Right. One could build some kind of proxy for the visitor, that could keep track of the previous or parent object. So our replacement would be simple.

To add to what @SerialVelocity suggested:

I really like the Presto parser API. For example, I have some code where I am converting from SQL to a datalog dialect, and this is one example of how I use the Presto API to do so.


    /*
     * Translates literals into corresponding DDlogRecord instances
     */
    private static class ParseLiterals extends AstVisitor<DDlogRecord, Boolean> {

        @Override
        protected DDlogRecord visitStringLiteral(final StringLiteral node, final Boolean isNullable) {
            try {
                return maybeOption(isNullable, new DDlogRecord(node.getValue()));
            } catch (final DDlogException e) {
                throw new RuntimeException(e);
            }
        }

        @Override
        protected DDlogRecord visitLongLiteral(final LongLiteral node, final Boolean isNullable) {
            return maybeOption(isNullable, new DDlogRecord(node.getValue()));
        }

        @Override
        protected DDlogRecord visitBooleanLiteral(final BooleanLiteral node, final Boolean isNullable) {
            return maybeOption(isNullable, new DDlogRecord(node.getValue()));
        }
    }

And elsewhere, I can do this:

// later, in some method
DDLogRecord record = parseLiterals.process(literal, isNullable);

Note that the visit methods can receive a context and a return value, both of which are generic types parameterized by AstVisitor<X, Y>. And in the above case, the ParseLiterals class is purely stateless (so I can re-use a single instance for the lifetime of my program).

I've been trying out the JSQLParser API simply because Presto is an odd SQL dialect and doesn't support some statements I need. But writing something like the above becomes cumbersome. The return values and the context I pass through the visitors (generic parameters X and Y) now become fields.

Here's the AstVisitor implementation in Presto: https://github.com/prestodb/presto/blob/master/presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java

The problem is described in this discussion. What type to return? Or do you want to multiply one visitor by adding different context types?

If I get your question correctly (correct me if I'm wrong please :) ), the signature I'm thinking of is:

public abstract class BaseVisitor<R, C>
{
    public R process(Node node)
    {
        return process(node, null);
    }

    public R process(Node node, @Nullable C context)
    {
        return node.accept(this, context);
    }

   // visit methods for each SQL node...
   public R visit(<SomeSqlType> node, C Context) {
   ....
   }
}

And yes, that will result in each use case specifying return and context types as they need. Visitors that don't need to return anything or need any context can use Void types for R or C.