gklijs / schema_registry_converter

A crate to convert bytes to something more useable and the other way around in a way Compatible with the Confluent Schema Registry. Supporting Avro, Protobuf, Json schema, and both async and blocking.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Proposal] Enhancements to improve DevEX and align crate with Confluent Schema Registry java library

FaveroFerreira opened this issue · comments

Feature Description

I'd like to propose a few changes aimed at improving the developer experience and ensuring the crate is in line with the Java library standards.

These changes are intended to make development and testing more straightforward and enhance the overall usability of the crate.

Here are the proposed modifications:

Introduce SchemaRegistryClient trait:

Using a trait we can significantly facilitate testing in applications that uses this crate. It will be easier to mock and test components that depend on it.

Refactor SrSettingsBuilder:

Rename to SchemaRegistryClientConfig and make methods return Self. This will improve the configuration setup in applications that use this crate.

Refactor SrSettings:

  • Rename it to CachedSchemaRegistryClient, and implement SchemaRegistryClient for it.
  • Move caches from AvroEncoder and AvroDecoder to the new Client implementation. This will make us able to share the cached schemas between the encoding and decoding process.

These changes will make this crate consistent with the Confluent Schema Registry library, and improve the code structure.

Use dynamic dispatch in AvroEncoder and AvroDecoder.

This will enable users to provide any implementation of SchemaRegistryClient to encoders and decoders

I understand that these changes are not trivial. @gklijs, if you are Okay with it, I can work on it myself and open a PR.

It's indeed not trivial. I'm also wondering what would be expected to be in the SchemaRegistryClient trait? Furthermore I wonder if big changes are being made, if we should not have a separate schema_registry_client module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types. I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.

I wonder if big changes are being made, if we should not have a separate schema_registry_client module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types.

I feel that spliting the code in multiple crates is the right way to go. Imho, the schema-registry-client should just be responsible for making HTTP calls to schema registry and caching retreived schemas.

Additional functionalities that uses schema-registry-client, should be provided by a separate crate in the workspace.

I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.

Tough question... I dont know. Supporting both async and sync code in Rust is not easy. Myself personaly have never used the blocking version of this crate, but I can't speak for everyone.

I'm also wondering what would be expected to be in the SchemaRegistryClient trait?

I think we should start by providing a method for each call that we support with the current enum SrCall, and build from there.

Something like:

#[async_trait]
pub trait SchemaRegistryClient {
    async fn get_by_id(&self, id: u32) -> Result<TBD, SchemaRegistryClientError>;
    async fn get_latest_by_subject(&self, subject: &str) -> Result<TBD, SchemaRegistryClientError>;
    async fn get_by_subject_and_version(&self, subject: &str, version: u32) -> Result<TBD, SchemaRegistryClientError>;
    async fn post_schema(&self, subject: &str, schema: &Schema) -> Result<TBD, SchemaRegistryClientError>;
}

It looks good and aligns with going async only.
Now I better read the first post, I'm not sure if performance-wise it's a good thing to have a cache only in the client, as there is some additional computation also cached currently. But we could add a second cache if needed. For example for protobuf, it would be nice to not have to run the proto resolver each time on the same cached JSON.

Yeah, I realised we could benefit from a second cache per decoder, because we actually need to parse the sechmas just once.

Also, I did split it into multiple crates. Ended with something like:

schema_registry_converter
|-- Cargo.toml
|-- (...)
|
├───schema-registry-client (http client and configurations)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde (core types for serialization and deserialization)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde-avro (specifics about avro)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde-protobuf (specifics about protobuf)
|   |-- lib.rs
|   |-- (...)
|
└───schema-registry-serde-json (specifics about json)
    |-- lib.rs
    |-- (..)

@FaveroFerreira FYI I just updated some dependencies, and some required some changes to either the main code or the tests. I still think this is a good idea, it's just also a lot of work.