r2dbc / r2dbc-spi

Service Provider Interface for R2DBC Implementations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specification request on Statement::add

lukaseder opened this issue · comments

To create a statement batch from a prepared statement with bind variables, we can see an example in the documentation:
https://r2dbc.io/spec/0.9.0.M1/spec/html/#statements.batching

// connection is a Connection object
Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ($1, $2)");
statement.bind(0, "John Doe").bind(1, "Happy Books LLC").add();
statement.bind(0, "Jane Doe").bind(1, "Scary Books Inc");
Publisher<? extends Result> publisher = statement.execute();

I've noticed that the MariaDB driver requires an additional trailing .add() call for the last set of bind values to be added to the batch, more like:

// connection is a Connection object
Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ($1, $2)");
statement.bind(0, "John Doe").bind(1, "Happy Books LLC").add();
statement.bind(0, "Jane Doe").bind(1, "Scary Books Inc").add();
Publisher<? extends Result> publisher = statement.execute();

Unfortunately, the spec isn't very clear about who's right. Both versions make sense in a way:

  • The documentation version is consistent with adding only one set of binds and then calling Statement::execute directly, without any call to Statement::add.
  • The MariaDB version is consistent with JDBC, where Statement::addBatch is called after each set of bind values, and finally the batch is executed with Statement::executeBatch. It is also more consistent with io.r2dbc.spi.Batch, where you need two calls to Batch::add to execute 2 statements, not 1 call.

I think that last bit hints at the real problem here. Unlike JDBC, the SPI is re-using Statement::execute, and as such, leaving room for interpretation to the semantics of Statement::add

I think there are two things to be improved here:

  1. Clearly specify whether Statement::add is
    1. required for the last set of binds (then, most drivers will have a bug)
    2. optional (then, the MariaDB driver has a bug, and another ambiguity arises, see below)
    3. wrong (then, the MariaDB driver has a bug)
    4. implementation specific (undesirable, as the callers would be left with the burden of writing driver-agnostic code, but then, no one has a bug)
  2. Optionally, refactor the API to remove this ambiguity, to require something more similar to JDBC and to io.r2dbc.spi.Batch

There's an ambiguity with respect to the optionality of the Statement::add call as suggested in 1ii). Consider this example, without binds:

Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
statement.add(); // This adds 1 row
statement.add(); // This adds 1 row
Publisher<? extends Result> publisher = statement.execute(); // What happens here?

Without binds (which is an edge case, yes, but the SPI should specify what happens in this case), there may or may not be an additional, 3rd row added. I tend to think that there's a third row, consistent with:

Statement s1 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s1.execute(); // 1 row is inserted

Statement s2 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s2.add().execute(); // 2 rows are inserted

Statement s3 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s3.add().add().execute(); // 3 rows are inserted

So, all of this hints at 1iii) or 2) being the most desirable options here, with a strong favour of 1iii), because an API re-design is costly, and probably not really necessary here.

Statement.add() should semantically translate to createStatement(…) providing a fluent API to specify bindings. It's also in place to avoid the need for overhead for inspecting whether a statement is parametrized. Calling add() should record the current statement state of bindings and prepare the statement object for accepting the next set of bindings whereas no bound parameters represent an empty binding set.

We should refine the specification wording to reflect that intent.

cc @rusher @rhedgpeth

Is it the best solution @mp911de ?

How framework expect using driver when batching, probably looping, then calling add, like https://github.com/micronaut-projects/micronaut-r2dbc/blob/master/data-r2dbc/src/main/java/io/micronaut/data/r2dbc/operations/DefaultR2dbcRepositoryOperations.java#L774

postgresql and mysql driver implementation when having parameter doesn't care about last add() :
i.e. in the following example the last .add() can either be present or remove :

        connection
                .createStatement("INSERT INTO batchStatement values ($1, $2)")
                .bind(0, 2)
                .bind(1, "test")
                .add()
                .bind(0, 3)
                .bind(1, "test2")
                .add()
                .execute()

I'll change mariadb driver to do likewise if spec indicate that is the way.

add is defined as:

Save the current binding and create a new one.

which translates to

      connection
                .createStatement("INSERT INTO batchStatement values ($1, $2)")
                .bind(0, 2)
                .bind(1, "test")
                .add()
                .bind(0, 3)
                .bind(1, "test2")
                .execute()

Calling add before execute without bindings should actually fail on the server-side so that the server reports missing bindings.

It makes sense, this was initial implementation, changed after mariadb-corporation/mariadb-connector-r2dbc#8 after differences with other drivers that permits .add() i did not check that was optionnal in those connectors at that time. :/

anyway, at least clarifying this is nice.

I currently struggle to find how we can improve the specification. The method is documented with Save the current binding and create a new one. and the docs show a correct example. Would extend the Javadoc to the following help:

    /**
     * Save the current binding and create a new one to indicate the statement should be executed again with new bindings provided through subsequent calls to {@code bind} and {@code bindNull}.
     * …
     */
    Statement add();

I think the difficulty both for users and implementors of the SPI is to understand that the last binding must not be "added", but executed directly, and that this works differently from JDBC.

Perhaps this is better explained by example than by prose?

Also, the TCK seems to be wrong:

https://github.com/r2dbc/r2dbc-spi/blob/47e2517/r2dbc-spi-test/src/main/java/io/r2dbc/spi/test/TestKit.java#L329-L343

And

https://github.com/r2dbc/r2dbc-spi/blob/47e2517/r2dbc-spi-test/src/main/java/io/r2dbc/spi/test/TestKit.java#L577-L593

Good catch, need to fix that. I'll come up with a pull request.

Good catch, need to fix that. I'll come up with a pull request.

Perhaps, how about two types of test cases:

  • A test that exposes the correct behaviour and runs
  • A test that has a trailing add() call and expects failure

A test that has a trailing add() call and expects failure

Makes sense for 0.9, introducing that kind of change in a bugfix release asks for trouble.

Makes sense for 0.9, introducing that kind of change in a bugfix release asks for trouble.

Sure, I'm thinking that all of this issue here is going into 0.9 only, for the benefit of new, future drivers that will get it right from the beginning.