tarantool / cartridge-springdata

Spring Data Tarantool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RFC: Сonfusing annotation functionality of @Tuple

ArtDu opened this issue · comments

Problem

Now we have the @Tuple annotation globally solving two different problems unrelated to each other:

  1. Specifying a spacename in POJO(Entity) for the SimpleTarantoolRepository to work correctly (i.e. for the work of predefined functional), so that the functions access the required spacename.
  2. An additional indication in the Repository for functions under the annotation @Query. An indication of what we expect in the response is only a serialized object - Tuple(number indexed table), with the name of the space, to use the metadata to get keys for serialization. Without this @Query specification, the method expects an object (Map, or primitive data type) to be accepted. This is necessary, as I understand it, because converters that can accept both Tuple and other data types have not yet been implemented.

Due to the fact that the annotation performs two different functions, this confuses the user. There are problems with deciding where to specify the annotation, and where not to, and for what purposes. The user can interpret @Tuple as an indication of which space we want to use and its format (metadata), or as an indication of the return type of Tuple (in truth, this may not immediately come to mind).

Suggested ways:

We can go two ways:

  1. Change the API
    a) Break backwards compatibility
    b) Leave support for existing functionality
  2. Write more detailed documentation with all notes

Research

  1. Change the API

Suggestions:

Move the work logic to @Query

@Tuple in Repository is hardwired to @Query, so it can only be used in conjunction with @Query. We could move the functionality of @Tuple to @Query by adding additional parameters. Examples:

  •  @Query(function = "func_name", serialized = true, [spaceName="space_name"])
  •  enum QueryReturnType {
        TUPLE,
        OBJECT
     }
     ...
     @Query(function = "func_name", returnType=QueryReturnType.TUPLE, spaceName="test_space")

etc.

Break backwards compatibility
We can completely remove the old @Tuple logic for Repository, or leave it and not use it as a depricate solution.

  1. Document in detail the description

Explicitly indicate in the README, in the important section, or in bold that @Tuple in the Repository is used to identify that the Tarantool output will accept ONLY Tuples for domainClass returnType - i.e. number indexed table.
For example, this could be {{val1, val2}} or box.space.get (...) etc. @Tuple in Entity is needed so that Tarantool can use the predefined SimpleTarantoolRepository and access the correct spacename.

In README now:

You can bind repository methods to calls of the stored functions in the Tarantool instance using the @Query annotation
with the stored function name specified in the functionName parameter.
For such methods, you can specify the stored function response format so that it will be parsed correctly. The response
format may be either an object (and a list of objects) or a tuple (and a list of tuples).

The response format may be either an object (and a list of objects) or a tuple (and a list of tuples).

It seems worth clarifying what an object is and what a tuple is. That the response format may not be an object but a map (map, string indexed table), and a tuple is a number indexed table {{val1, val2}}. It is also worth pointing out that we can return primitive data types.

For such methods, you can specify the stored function response format so that it will be parsed correctly

It should be stated explicitly that if we specify @Tuple, then we accept only an array (number indexed tuple). And if we do not specify it, then map (string indexed table) or primitive data.

Also, JavaDoc comments are clearly outdated.

/**
* Identifies domain object for saving into a Tarantool space
*
* @author Alexey Kuzin
*/
@Persistent
@Inherited
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.TYPE, ElementType.METHOD})
public @interface Tuple {
/**
* The name of Tarantool space where the marked class objects are supposed to be stored in. The space name will be
* derived from the class name if not specified. Alias for {@link #spaceName()}.
*
* @return the name of space for storing the object
*/
@AliasFor("spaceName")
String value() default "";
/**
* The name of Tarantool space where the marked class objects are supposed to be stored in. The space name will be
* derived from the class name if not specified. Alias for {@link #value()}.
*
* @return the name of space for storing the object
*/
@AliasFor("value")
String spaceName() default "";

PS See also #30 to make API consistent.

Feedback after verbal discussion(2022.01.13) with @vrogach2020 @wey1and :

  1. Remove @Tuple from the RepositoryInterface (break backward compatibility)

  2. Change @Query by default accept Tuple (to be like in SimpleTarantoolRepository)

  3. Without flags in @Query. We look at the presence of @Tuple in Entity, if there is, then we expect tuple response (+ we take the schema name). If not, it expects map/primitive.

    • Write tests to check correct working:
       class SimpleArray {
         Integer field1;
         String field2;
         ...
       }
      
       @Tuple("schema1")
       class SimpleArrayTuple extends SimpleArray {}
      
       @Tuple("schema2")
       class SimpleArrayTuple2 extends SimpleArray {}
      
       @Tuple("schema3")
       class SimpleArrayTuple3 extends SimpleArray2 {}
    • If the name of the space is not specified, then we make a snakcase from the name of the entity
       @Tuple
       class SimpleArrayTuple extends SimpleArray {}
  4. (Separate ticket) We always want to parse metadata if we expect Tuple in the response - {"metadata" : [{'name': 'id', 'type': ''}, {'name': ''}], "rows ": [...]} (not working now), then:

    • The order of applying the metadata:
    1. Metadata from response (* caching)
    2. Metadata from ddl.get_schema q

Discussed by voice with @wey1and (2022.01.14)

We decided that the solution with the parameter in @Query would still be much more transparent. Those. the schema will be like this:

  1. Remove @Tuple from the RepositoryInterface (break backward compatibility)
  2. Change @Query to accept Tuple by default (to be like in SimpleTarantoolRepository)
  3. We add an additional flag for @Query, which will indicate whether we use a converter for tuples or for objects.
    Options:
@Query(function = "", receiveTuple = false) // default recieveTuple = true
@Query(function = "", tupleConverter = false) // default tupleConverter = true
@Query(function = "", useTupleConverter = false) // default tupleConverter = true
...

Those if the resulting type has @Tuple(spaceName = "") annotation, then we can use SimpleTarantoolRepository functions and functions with @Query(function = "") annotation

Thank you for this detailed RFC! Let me put down some thoughts there, that may help improve it:

This is necessary, as I understand it, because converters that can accept both Tuple and other data types have not yet been implemented.

There were no objectives for combining both tuples and other types in the server response yet. The core problem that required a presence of the Tuple annotation on methods -- providing a possibility for the users to specify what type of the server response they expect. Automatic detection of the response type is not possible since the serialization format is the same for all cases and the internal representation is not distinguishable from the transport (e.g. driver) POV.

Explicitly indicate in the README, in the important section, or in bold that @tuple in the Repository is used to identify that the Tarantool output will accept ONLY Tuples for domainClass returnType - i.e. number indexed table.

The hard thing to understand here is the fact that:
a) The server response format is different for each case internally and has many types (apart from the usual combination of single object / multi object of same type for many other database servers, where most of them are dealing with SQL rows for both variants and most of others use simple structures like JSON). Our protocol is very complicated, since the actual data are buried inside several layers of wrapping that follows down the API layers (IPROTO, crud, Lua responses). We obviously cannot solve this complexity by some smart magic inside the driver, because it requires too much knowledge of the user logic behind the data, so it will either leak inevitably to the surface or will not work for all cases. I'd prefer showing the "nuts and bolts" to the users rather than frustrating them by artificial restrictions.
b) We have two directions of serialization actually. That probably needs to be more clarified in README. And when we are specifying an annotation on a POJO class, it tells not only about deserialization, but also about the serialization. We have to support at least three cases: writing tuples, writing maps and writing primitives. That follows the types of API calls that we have. And we cannot write POJOs as tuples without specifying a special annotation, that's how it is done in Spring Data framework.
On the way back, when deserializing a value, it is necessary to give the deserializer a hint about how to interpret the msgpack structures it receives as an input, since they all are the same from the protocol POV, but are different from the business logic POV. I don't like much the term "number indexed table" since it is actually wrong: for tuples we have two nested tables (always), while we can either have or not have a wrapping table depending on whether we are receiving a single object or a list of them. Furthermore, a string-indexed table as you are referring to it is a special case of a primitive and it may appear either a container-type value or a response object type.The key thing is that the driver takes responsibility for dealing with serialization/deserialization of the whole diversity of the object types, and here we (or the user) just need to provide the proper hint for the the driver about how to interpret the particular input/output objects.

 @Query(function = "", receiveTuple = false) // default recieveTuple = true
@Query(function = "", tupleConverter = false) // default tupleConverter = true
@Query(function = "", useTupleConverter = false)

Bool flags is a no way here, use either enums or classes references.

I will try to express my thoughts, analysis about these comments.

Consider different response structures from Tarantool:

  • box.tuple - we have one kind of real taps - this is box.tuple(cdata). If we talk about the internal representation, then this is cdata (for lua) and in serialized form (what is transmitted via net.box) MP_ARRAY (array of index keys for MSGPACK). We can directly access the taps (spaces) of the instance through

    • IProto(INSERT,SELECT and etc.)
    • through the layer of the IProto (CALL / EVAL) operation, where inside there will be return box.space.select or something like that.

    But in any case, we will get the same result, it will be array in array

    body: {48: [[1, 2, 3]]}
    

This response is accepted by java connectors as a TarantoolTuple if it matches the specified metadata format.

  • crud API - here we can only get the result through the IProto(CALL/EVAL) call, but we get the result already in the form

    body: {48: [{"metadata": [{"name": "id", "type": "unsigned"}, ...], "rows": [[1, 12477, "1", " eleven"]]}]}
    

    This response is accepted by java connectors as a TarantoolTuple if rows matches the specified metadata format.

  • lua response(array in array) - here we also call IProto(CALL/EVAL). We can return a table from lua {{1,2,3}} and serialize just like box.tuple in MP_ARRAY.

    body: {48: [[1, 2, 3]]}
    
  • lua response(map/primitive) - here we also call IProto(CALL/EVAL). We can return an association table from lua {key1 = val1, key2 = val2...}.


From these response structures, we can infer:

I don't like much the term "number indexed table" since it is actually wrong: for tuples we have two nested tables (always), while we can either have or not have a wrapping table depending on whether we are receiving a single object or a list of them.

  • It was wrong to call such response structures "number indexed table". It's probably better to use the word array instead of number indexed table.

  • Now the driver only accepts an array within an array, we cannot pass one tuple {1,2,3} without a wrapping array. You need to do a test for this, the ticket already exists #79.

Our protocol is very complicated, since the actual data are buried inside several layers of wrapping that follows down the API layers (IPROTO, crud, Lua responses). We obviously cannot solve this complexity by some smart magic inside the driver, because it requires too much knowledge of the user logic behind the data, so it will either leak inevitably to the surface or will not work for all cases.

The core problem that required a presence of the Tuple annotation on methods -- providing a possibility for the users to specify what type of the server response they expect.

  • Yes, we provide options for users to specify what type of server response they expect, but we do it redundantly. The user specifies returnType in the repository method and does not need to specify anything else, now I will try to explain why.

    If the user wants to get a compound object of the class, then he specifies it in returnType. Further, there should be certain rules according to which the ORM occurs, these rules are that the returned object must have keys by which the fields of the object that came from Tarantool will be mapped to the fields of the java class object. There are two options for transferring compound objects from Tarantool:

    1. Transfer of keys along with the answer - map. If the user passes the keys along with the response, then we know how to map the objects and we do this without using any metadata stored inside the connector.
    2. Passing an object without keys - arrayInArray/crudResponse. The keys are already stored in Java. Then we expect that the structure of the tuple will match the returnType class metadata(@Tuple in Entity). Note, crud response can also be attributed to the first type.

    The fact that we specify the additional flag useTupleConverter or @Tuple in Repository only performs a validation function, so that we can only accept data of certain formats.

And the problem is that we can enable this validation only to check tuples without metadata (we will assume that metadata from crudResponse does not exist, because we do not use it). For other types(Map, Primitives) there is no validation. The problem that emerges from this is that in enum we can only write something like this

public enum ValidationFormat {
    FLATTEN_TUPLE,
    ANY_EXCEPT_FLATTEN_TUPLE
}

What doesn't look pretty 😞

Proposal

Therefore I propose repeat steps 1, 2 from here #78 (comment) and use enum like below:

  1. Remove @Tuple from the RepositoryInterface (break backward compatibility)
  2. Change @Query to accept Tuple by default (to be like in SimpleTarantoolRepository)
  3. We add an additional parameter in @Query
@Query(function = "returning_simple_map", output = ValidationFormat.ANY)

public enum ValidationFormat {
    FLATTEN_TUPLE,
    ANY
}

And either make a ticket, or implement within one PR the ability to accept a tuple(array) as TarantoolTuple or as ArrayList from callForObject.

With @vrogach2020 @wey1and decided that now the best solution is to do this:

  1. Remove @Tuple from the RepositoryInterface (break backward compatibility)
  2. Change @Query to accept Tuple by default (to be like in SimpleTarantoolRepository)
  3. We add an additional parameter in @Query
@Query(function = "returning_simple_map", output = TarantoolSerializationType.AUTO)

public enum TarantoolSerializationType {
    TUPLE,
    AUTO
}

And make it so that callForObject(TarantoolSerializationType.AUTO) can accept: TUPLE, MAP, PRIMITIVES and List<Object>, now it can only accept: MAP, PRIMITIVES