trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Home Page:https://trino.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support Confluent Schema Registry

jeqo opened this issue · comments

Confluent Schema Registry is commonly used to store Avro schemas, and reduce Kafka records size by storing an ID instead of the schema on the record message.

Currently, only Avro files are supported to decode Kafka records.

I'd like to propose adding support for Schema Registry on the Avro decoder.

Work items:

  • Read (SELECT) path for Avro using Schema registry: #6137
  • Write (INSERT) path for Avro using Schema registry
  • Round trip product tests (INSERT+SELECT) path for Avro using Schema registry
  • Fail fast for not supported types JSON, CSV, RAW.
  • Run smoke tests and distributed tests queries queries agains schema registry
  • Documentation

@elonazoulay, you've been looking into this, no?

Yep, we will contribute what we have shortly, we also have a use case for this: we use the schema registry to supply metadata and publish keys and values using the String and Avro Kafka deserializers from confluent.

@jeqo sounds very similar to our use case, once we put what we have up it would be great to collaborate on this, there might be things that are very specific to our use case, we can make them more general.

@jeqo, here is the pull request: #2361 - still cleaning up the part where schema registry is used to infer the schema (i.e. without the need for json files).

It looks like there is a lot of overlap between #2106 and #2361 :)

Link prestodb/presto#11354

(edit) Why are there multiple repos?

Link prestodb/presto#11354

(edit) Why are there multiple repos?

facebook distribution and community opensource distribution. Blog link explanation . I had the same question when I saw your issue @Cricket007

@zhenik the blog you linked above is just a subjective commentary and IMO doesn't explain much.
@Cricket007 You can see previous discussion under #380.
If you have any doubts, I encourage you to reach out on our community slack.

Let's keep the discussion here focused on the Confluent Schema Registry.

Thanks for the links!

I only opened that issue after I saw Kafka reader was added