gklijs / schema_registry_converter

A crate to convert bytes to something more useable and the other way around in a way Compatible with the Confluent Schema Registry. Supporting Avro, Protobuf, Json schema, and both async and blocking.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multiple values encoding/decoding

verrchu opened this issue · comments

At times it is convenient to pass multiple events at once to avoid inconsistet state if for example 1 out of 10 events causes some kind of error on its way.

I guess it would be great to have some API llike (encode|decode)_many which would pack events together and then unpack then successfully. avro-rs works this way actually.

Hi, thanks for creating the issue. I do need a little bit more context through. What would be difference from running it 10 times and just ignoring the error?

The idea is to be able to send events in batches. This way either all the events are sent or fail to be send. This allows for better consistency in cases when for example one API request results in multiple events.

And from my experience it is not a good solution to simplt ignore errors caused by faulure to send an event. It could rasily lead to data loss.

Yes, but this library don't send the events. Therefore I still don't know how the api would look. You could already try to encode|decode multiple items, and only send/use them when they all succeed.

Ok maybe I am not being clear enough.

I am suggesting that this library provides the following API
encode_many(schema, values) -> encoded
decode_many(encoded) -> values
where encoded is a schema-id-prefixed binary object which contains batched serialized values
avro-rs operates in a similar manner with its Writer and Reader

this way an application can encode many events in one blob
and then atomically send this blob over the wire

Yes, I get that, and with avro you might need/want to reuse schema's. But in this case there already is a cache for the information needed from the schema registry. So I don't see the added value of adding something like that to this crate. I also don't really understand how you would like to send everything in one blob, since each blob should be properly coded.

Could you please once more try to explain how this is useful, and what would be the advantage of having it in the library? I'm stil failing to see why you couldn't batch them on you side, since the schema information is cashed anyway.

Let me try to emphasize the problem once more.

There is a typical web app which opens a DB transaction in the beginning of request handling and commits the transaction in the end if all goes well.

In the process of request handling it might accumulate more then one event to send afterwards.

The strategy of event transfer in this app is to send the event(s) right before commiting the transaction so that we know that sent events represent consistent state.

The problem with this approach i that when having multiple events to send the application could fail somewhere in the middle and only some events are succesfully sent while the transaction gets rollbacked.

The solution i see is to serialize those events into single blob and send them at once and deserialize this blb into separate events on the other end (avro-ra does similar thing).

Unfortunately i can't see how the "batching" you mentioned can be done otherwise.

The problem is that this library is only concerned with getting the messages in the correct binary format. You could for example create 5 Kafka Records, one after the other using this library, before sending them to a broker.

But Kafka doesn't has the concept of transactions, so even if you send all 5 records right after another to Kafka, it might be that some fail, and some succeed. Although with the correct settings you can be almost sure they would always be send in the same batch, so they do will either fail or succeed all.

Having multiple events in one message isn't supported by schema registry. You could have a schema with references to the different parts, so you could send all the needed information in one Kafka Record, which is also supported by this library.

An alternative approach would be to use a relational DB and put all the messages there in a transaction, optionally using the outbox pattern. Then using Debezium to get them from the database to Kafka.

I'm closing this issue, since it's a problem this library can't fix.