[Proposal] Enhancements to improve DevEX and align crate with Confluent Schema Registry java library

Question

[Proposal] Enhancements to improve DevEX and align crate with Confluent Schema Registry java library

FaveroFerreira opened this issue 5 months ago · comments

Guilherme Favero Ferreira commented 5 months ago

Feature Description

I'd like to propose a few changes aimed at improving the developer experience and ensuring the crate is in line with the Java library standards.

These changes are intended to make development and testing more straightforward and enhance the overall usability of the crate.

Here are the proposed modifications:

Introduce `SchemaRegistryClient` trait:

Using a trait we can significantly facilitate testing in applications that uses this crate. It will be easier to mock and test components that depend on it.

Refactor `SrSettingsBuilder`:

Rename to SchemaRegistryClientConfig and make methods return Self. This will improve the configuration setup in applications that use this crate.

Refactor `SrSettings`:

Rename it to CachedSchemaRegistryClient, and implement SchemaRegistryClient for it.
Move caches from AvroEncoder and AvroDecoder to the new Client implementation. This will make us able to share the cached schemas between the encoding and decoding process.

These changes will make this crate consistent with the Confluent Schema Registry library, and improve the code structure.

Use dynamic dispatch in `AvroEncoder` and `AvroDecoder`.

This will enable users to provide any implementation of SchemaRegistryClient to encoders and decoders

Guilherme Favero Ferreira · Answer 1 · Thu Feb 29 2024 01:48:30 GMT+0800 (China Standard Time)

I understand that these changes are not trivial. @gklijs, if you are Okay with it, I can work on it myself and open a PR.

Gerard Klijs · Answer 2 · Thu Feb 29 2024 16:11:10 GMT+0800 (China Standard Time)

It's indeed not trivial. I'm also wondering what would be expected to be in the SchemaRegistryClient trait? Furthermore I wonder if big changes are being made, if we should not have a separate schema_registry_client module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types. I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.

Guilherme Favero Ferreira · Answer 3 · Thu Feb 29 2024 20:33:37 GMT+0800 (China Standard Time)

I wonder if big changes are being made, if we should not have a separate schema_registry_client module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types.

I feel that spliting the code in multiple crates is the right way to go. Imho, the schema-registry-client should just be responsible for making HTTP calls to schema registry and caching retreived schemas.

Additional functionalities that uses schema-registry-client, should be provided by a separate crate in the workspace.

I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.

Tough question... I dont know. Supporting both async and sync code in Rust is not easy. Myself personaly have never used the blocking version of this crate, but I can't speak for everyone.

Guilherme Favero Ferreira · Answer 4 · Thu Feb 29 2024 20:44:25 GMT+0800 (China Standard Time)

I'm also wondering what would be expected to be in the SchemaRegistryClient trait?

I think we should start by providing a method for each call that we support with the current enum SrCall, and build from there.

Something like:

#[async_trait]
pub trait SchemaRegistryClient {
    async fn get_by_id(&self, id: u32) -> Result<TBD, SchemaRegistryClientError>;
    async fn get_latest_by_subject(&self, subject: &str) -> Result<TBD, SchemaRegistryClientError>;
    async fn get_by_subject_and_version(&self, subject: &str, version: u32) -> Result<TBD, SchemaRegistryClientError>;
    async fn post_schema(&self, subject: &str, schema: &Schema) -> Result<TBD, SchemaRegistryClientError>;
}

Gerard Klijs · Answer 5 · Fri Mar 01 2024 22:08:13 GMT+0800 (China Standard Time)

It looks good and aligns with going async only.
Now I better read the first post, I'm not sure if performance-wise it's a good thing to have a cache only in the client, as there is some additional computation also cached currently. But we could add a second cache if needed. For example for protobuf, it would be nice to not have to run the proto resolver each time on the same cached JSON.

Guilherme Favero Ferreira · Answer 6 · Fri Mar 01 2024 22:34:12 GMT+0800 (China Standard Time)

Yeah, I realised we could benefit from a second cache per decoder, because we actually need to parse the sechmas just once.

Also, I did split it into multiple crates. Ended with something like:

schema_registry_converter
|-- Cargo.toml
|-- (...)
|
├───schema-registry-client (http client and configurations)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde (core types for serialization and deserialization)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde-avro (specifics about avro)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde-protobuf (specifics about protobuf)
|   |-- lib.rs
|   |-- (...)
|
└───schema-registry-serde-json (specifics about json)
    |-- lib.rs
    |-- (..)

Gerard Klijs · Answer 7 · Sun Jun 16 2024 16:52:17 GMT+0800 (China Standard Time)

@FaveroFerreira FYI I just updated some dependencies, and some required some changes to either the main code or the tests. I still think this is a good idea, it's just also a lot of work.

[Proposal] Enhancements to improve DevEX and align crate with Confluent Schema Registry java library

Feature Description

Introduce SchemaRegistryClient trait:

Refactor SrSettingsBuilder:

Refactor SrSettings:

Use dynamic dispatch in AvroEncoder and AvroDecoder.

Introduce `SchemaRegistryClient` trait:

Refactor `SrSettingsBuilder`:

Refactor `SrSettings`:

Use dynamic dispatch in `AvroEncoder` and `AvroDecoder`.