Specification request on Statement::add
lukaseder opened this issue · comments
To create a statement batch from a prepared statement with bind variables, we can see an example in the documentation:
https://r2dbc.io/spec/0.9.0.M1/spec/html/#statements.batching
// connection is a Connection object
Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ($1, $2)");
statement.bind(0, "John Doe").bind(1, "Happy Books LLC").add();
statement.bind(0, "Jane Doe").bind(1, "Scary Books Inc");
Publisher<? extends Result> publisher = statement.execute();
I've noticed that the MariaDB driver requires an additional trailing .add()
call for the last set of bind values to be added to the batch, more like:
// connection is a Connection object
Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ($1, $2)");
statement.bind(0, "John Doe").bind(1, "Happy Books LLC").add();
statement.bind(0, "Jane Doe").bind(1, "Scary Books Inc").add();
Publisher<? extends Result> publisher = statement.execute();
Unfortunately, the spec isn't very clear about who's right. Both versions make sense in a way:
- The documentation version is consistent with adding only one set of binds and then calling
Statement::execute
directly, without any call toStatement::add
. - The MariaDB version is consistent with JDBC, where
Statement::addBatch
is called after each set of bind values, and finally the batch is executed withStatement::executeBatch
. It is also more consistent withio.r2dbc.spi.Batch
, where you need two calls toBatch::add
to execute 2 statements, not 1 call.
I think that last bit hints at the real problem here. Unlike JDBC, the SPI is re-using Statement::execute
, and as such, leaving room for interpretation to the semantics of Statement::add
I think there are two things to be improved here:
- Clearly specify whether
Statement::add
is- required for the last set of binds (then, most drivers will have a bug)
- optional (then, the MariaDB driver has a bug, and another ambiguity arises, see below)
- wrong (then, the MariaDB driver has a bug)
- implementation specific (undesirable, as the callers would be left with the burden of writing driver-agnostic code, but then, no one has a bug)
- Optionally, refactor the API to remove this ambiguity, to require something more similar to JDBC and to
io.r2dbc.spi.Batch
There's an ambiguity with respect to the optionality of the Statement::add
call as suggested in 1ii). Consider this example, without binds:
Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
statement.add(); // This adds 1 row
statement.add(); // This adds 1 row
Publisher<? extends Result> publisher = statement.execute(); // What happens here?
Without binds (which is an edge case, yes, but the SPI should specify what happens in this case), there may or may not be an additional, 3rd row added. I tend to think that there's a third row, consistent with:
Statement s1 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s1.execute(); // 1 row is inserted
Statement s2 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s2.add().execute(); // 2 rows are inserted
Statement s3 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s3.add().add().execute(); // 3 rows are inserted
So, all of this hints at 1iii) or 2) being the most desirable options here, with a strong favour of 1iii), because an API re-design is costly, and probably not really necessary here.
Statement.add()
should semantically translate to createStatement(…)
providing a fluent API to specify bindings. It's also in place to avoid the need for overhead for inspecting whether a statement is parametrized. Calling add()
should record the current statement state of bindings and prepare the statement object for accepting the next set of bindings whereas no bound parameters represent an empty binding set.
We should refine the specification wording to reflect that intent.
Is it the best solution @mp911de ?
How framework expect using driver when batching, probably looping, then calling add, like https://github.com/micronaut-projects/micronaut-r2dbc/blob/master/data-r2dbc/src/main/java/io/micronaut/data/r2dbc/operations/DefaultR2dbcRepositoryOperations.java#L774
postgresql and mysql driver implementation when having parameter doesn't care about last add() :
i.e. in the following example the last .add()
can either be present or remove :
connection
.createStatement("INSERT INTO batchStatement values ($1, $2)")
.bind(0, 2)
.bind(1, "test")
.add()
.bind(0, 3)
.bind(1, "test2")
.add()
.execute()
I'll change mariadb driver to do likewise if spec indicate that is the way.
add
is defined as:
Save the current binding and create a new one.
which translates to
connection
.createStatement("INSERT INTO batchStatement values ($1, $2)")
.bind(0, 2)
.bind(1, "test")
.add()
.bind(0, 3)
.bind(1, "test2")
.execute()
Calling add
before execute without bindings should actually fail on the server-side so that the server reports missing bindings.
It makes sense, this was initial implementation, changed after mariadb-corporation/mariadb-connector-r2dbc#8 after differences with other drivers that permits .add() i did not check that was optionnal in those connectors at that time. :/
anyway, at least clarifying this is nice.
I currently struggle to find how we can improve the specification. The method is documented with Save the current binding and create a new one.
and the docs show a correct example. Would extend the Javadoc to the following help:
/**
* Save the current binding and create a new one to indicate the statement should be executed again with new bindings provided through subsequent calls to {@code bind} and {@code bindNull}.
* …
*/
Statement add();
I think the difficulty both for users and implementors of the SPI is to understand that the last binding must not be "added", but executed directly, and that this works differently from JDBC.
Perhaps this is better explained by example than by prose?
Also, the TCK seems to be wrong:
And
Good catch, need to fix that. I'll come up with a pull request.
Good catch, need to fix that. I'll come up with a pull request.
Perhaps, how about two types of test cases:
- A test that exposes the correct behaviour and runs
- A test that has a trailing
add()
call and expects failure
A test that has a trailing add() call and expects failure
Makes sense for 0.9, introducing that kind of change in a bugfix release asks for trouble.
Makes sense for 0.9, introducing that kind of change in a bugfix release asks for trouble.
Sure, I'm thinking that all of this issue here is going into 0.9 only, for the benefit of new, future drivers that will get it right from the beginning.