gklijs / schema_registry_converter

A crate to convert bytes to something more useable and the other way around in a way Compatible with the Confluent Schema Registry. Supporting Avro, Protobuf, Json schema, and both async and blocking.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for schema reference validation

arkanmgerges opened this issue · comments

Is your feature request related to a problem? Please describe.
I have a use case,

  1. Creating schema with references
  2. Validating the schema

I have a user schema with "id", and "role_name", and user_command schema which is used as a part of a micro service, but also it uses the user schema as a reference

1. User and User Command schemas:

pub fn user_schema() -> SuppliedReference {
    let schema_raw = r#"
    {
        "type":"object",
        "properties":{
            "id":{"type":"string"},
            "role_name":{"type":"string"}
        }
    }
    "#;

    SuppliedReference {
        name: String::from("com.example.user"),
        subject: String::from("com.example.user"),
        schema: String::from(schema_raw),
        references: vec![],
    }
}

pub fn user_command_schema() -> SuppliedSchema {
    SuppliedSchema { 
        name: Some(String::from("com.example.user_command")), 
        schema_type: SchemaType::Json, 
        schema: r#"
        {
            "properties":{
                "id":{"type":"string"},
                "name":{"type":"string"},
                "metadata":{"type":"string"},
                "creator_service_name":{"type":"string"},
                "created_on":{"type":"integer"},
                "data": {
                    "$ref": "{}"
                } 
            }
        }"#.to_string().replace("{}", &user_schema().subject), 
        references: vec![user_schema()]
    }
}

2. Registering the schemas into the schema registry:

let schema_registry_url = "localhost:9001".to_string();
let subject= "com.example.user_command".to_string();
let result = post_schema(
            &SrSettings::new(schema_registry_url),
            subject,
            user_command_schema(),
        )
        .await
        .unwrap();

println!("result: {:?}", result);
println!("Schema registry creation is done");

Until here there there is not problem, and the schema will be registered into the schema registry.

The problem arises when I need to validate + produce it to Kafka, how can I use my user_command_schema to validate my data before is produced into Kafka

The code will fail if I try to encode the data using the schema and it will fail by running this code:

// source at: https://github.com/gklijs/schema_registry_converter/blob/master/src/async_impl/json.rs#L237

fn reference_url(rr: &RegisteredReference) -> Result<Url, SRCError> {
    match Url::from_str(&*rr.name) {
        Ok(v) => Ok(v),
        Err(e) => Err(SRCError::non_retryable_with_cause(e, &*format!("reference schema with subject {} and version {} has invalid id {}, it has to be a fully qualified url", rr.subject, rr.version, rr.name)))
    }
}

Describe the solution you'd like
I need in my example using json that producer.send_json to do the validation before sending the data to kafka, also in case transactional producer/consumer to validate the data before sending (for producer) and after receiving (for consumer).

Hi, I tried to make changes for using the referenced schema as Url, I've changed.
And I made the code to work, but the problem that the schema above that I'm using does not match the value, and the validation from valico (https://github.com/gklijs/schema_registry_converter/blob/v2.1.0/src/async_impl/json.rs#L100) returns ValidationState with empty array for errors.

My changes were as follows:

// https://github.com/gklijs/schema_registry_converter/blob/v2.1.0/src/async_impl/json.rs#L185
fn reference_url(rr: &RegisteredReference) -> Result<Url, SRCError> {
    match Url::from_str(&*rr.name) {
        Ok(v) => Ok(v),
        Err(e) => Err(SRCError::non_retryable_with_cause(e, &*format!("reference schema with subject {} and version {} has invalid id {}, it has to be a fully qualified url", rr.subject, rr.version, rr.name)))
    }
}

To

fn reference_url(rr: &RegisteredReference, sr_settings: Option<&SrSettings>) -> Result<Url, SRCError> {
    match Url::from_str(&*rr.name) {
        Ok(v) => Ok(v),
        Err(e) => {
            match sr_settings {
                Some(sr) => {
                    Ok(Url::from_str(&format!("{}/{}", sr.url(), rr.name.clone())).unwrap())
                },
                _ => Err(SRCError::non_retryable_with_cause(e, &*format!("reference schema with subject {} and version {} has invalid id {}, it has to be a fully qualified url", rr.subject, rr.version, rr.name)))
            }
        }
    }
}

and the line:
// https://github.com/gklijs/schema_registry_converter/blob/v2.1.0/src/async_impl/json.rs#L207
let url = reference_url(&rr)?;
to this:
let url = reference_url(&rr, Some(sr_settings))?;

And I put println!() here to see what I have:

// https://github.com/gklijs/schema_registry_converter/blob/v2.1.0/src/async_impl/json.rs#L97

pub fn validate(schema: JsonSchema, value: &Value) -> Result<(), SRCError> {
    let mut scope = Scope::new();
    let schema = add_refs_to_scope(&mut scope, schema)?;
    let validation = schema.validate(value);

    println!("---- value variable is ---- \n{:?}", value);
    println!("---- schema variable is ---- \n{:?}", schema);
    println!("---- validation variable is ---- \n{:?}", validation);

    handle_validation(validation, value)
}
---- value variable is ----
Object({"age": Number(43), "name": String("John Doe"), "phones": Array([String("+44 1234567"), String("+44 2345678")])})


---- schema variable is ----
ScopedSchema { scope: Scope { keywords: {"maxLength": KeywordConsumer { keys: ["maxLength"], keyword: <keyword> }, "properties": KeywordConsumer { keys: ["properties", "additionalProperties", "patternProperties"], keyword: <keyword>
 }, "format": KeywordConsumer { keys: ["format"], keyword: <keyword> }, "type": KeywordConsumer { keys: ["type"], keyword: <keyword> }, "patternProperties": KeywordConsumer { keys: ["properties", "additionalProperties", "patternProp
erties"], keyword: <keyword> }, "required": KeywordConsumer { keys: ["required"], keyword: <keyword> }, "maximum": KeywordConsumer { keys: ["maximum"], keyword: <keyword> }, "additionalItems": KeywordConsumer { keys: ["items", "addi
tionalItems"], keyword: <keyword> }, "not": KeywordConsumer { keys: ["not"], keyword: <keyword> }, "if": KeywordConsumer { keys: ["if", "then", "else"], keyword: <keyword> }, "enum": KeywordConsumer { keys: ["enum"], keyword: <keywo
rd> }, "minProperties": KeywordConsumer { keys: ["minProperties"], keyword: <keyword> }, "else": KeywordConsumer { keys: ["if", "then", "else"], keyword: <keyword> }, "minLength": KeywordConsumer { keys: ["minLength"], keyword: <key
word> }, "contentEncoding": KeywordConsumer { keys: ["contentMediaType", "contentEncoding"], keyword: <keyword> }, "$ref": KeywordConsumer { keys: ["$ref"], keyword: <keyword> }, "anyOf": KeywordConsumer { keys: ["anyOf"], keyword: 
<keyword> }, "multipleOf": KeywordConsumer { keys: ["multipleOf"], keyword: <keyword> }, "pattern": KeywordConsumer { keys: ["pattern"], keyword: <keyword> }, "then": KeywordConsumer { keys: ["if", "then", "else"], keyword: <keyword
> }, "const": KeywordConsumer { keys: ["const"], keyword: <keyword> }, "oneOf": KeywordConsumer { keys: ["oneOf"], keyword: <keyword> }, "contentMediaType": KeywordConsumer { keys: ["contentMediaType", "contentEncoding"], keyword: <
keyword> }, "minimum": KeywordConsumer { keys: ["minimum"], keyword: <keyword> }, "allOf": KeywordConsumer { keys: ["allOf"], keyword: <keyword> }, "exclusiveMaximum": KeywordConsumer { keys: ["exclusiveMaximum"], keyword: <keyword>
 }, "maxProperties": KeywordConsumer { keys: ["maxProperties"], keyword: <keyword> }, "additionalProperties": KeywordConsumer { keys: ["properties", "additionalProperties", "patternProperties"], keyword: <keyword> }, "dependencies":
 KeywordConsumer { keys: ["dependencies"], keyword: <keyword> }, "uniqueItems": KeywordConsumer { keys: ["uniqueItems"], keyword: <keyword> }, "minItems": KeywordConsumer { keys: ["minItems"], keyword: <keyword> }, "items": KeywordC
onsumer { keys: ["items", "additionalItems"], keyword: <keyword> }, "exclusiveMinimum": KeywordConsumer { keys: ["exclusiveMinimum"], keyword: <keyword> }, "maxItems": KeywordConsumer { keys: ["maxItems"], keyword: <keyword> }, "con
tains": KeywordConsumer { keys: ["contains"], keyword: <keyword> }, "propertyNames": KeywordConsumer { keys: ["propertyNames"], keyword: <keyword> }}, schemes: {"http://localhost:9001/id/26.json": Schema { id: Some(Url { scheme: "ht
tp", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(9001), path: "/id/26.json", query: None, fragment: None }), schema: None, original: Object({"properties": Object({"created_on": 
Object({"type": String("integer")}), "creator_service_name": Object({"type": String("string")}), "data": Object({"$ref": String("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("string")}), "metadata": Object({"ty
pe": String("string")}), "name": Object({"type": String("string")})})}), tree: {"properties": Schema { id: None, schema: None, original: Object({"created_on": Object({"type": String("integer")}), "creator_service_name": Object({"typ
e": String("string")}), "data": Object({"$ref": String("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("string")}), "metadata": Object({"type": String("string")}), "name": Object({"type": String("string")})}), tr
ee: {"created_on": Schema { id: None, schema: None, original: Object({"type": String("integer")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "creator_service_name": Schema { id: None, schem
a: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "data": Schema { id: None, schema: None, original: Object({"$ref": String("ro.esmartbill.id_
and_access.user")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "id": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scop
es: {}, default: RefCell { value: None } }, "metadata": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "name": Sche
ma { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }}, validators: [], scopes: {}, default: RefCell { value: None } }}, valida
tors: [<validator>], scopes: {}, default: RefCell { value: None } }, "http://localhost:9001/ro.esmartbill.id_and_access.user": Schema { id: Some(Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some
(Domain("localhost")), port: Some(9001), path: "/ro.esmartbill.id_and_access.user", query: None, fragment: None }), schema: None, original: Object({"properties": Object({"id": Object({"type": String("string")}), "role_name": Object(
{"type": String("string")})}), "type": String("object")}), tree: {"properties": Schema { id: None, schema: None, original: Object({"id": Object({"type": String("string")}), "role_name": Object({"type": String("string")})}), tree: {"
id": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "role_name": Schema { id: None, schema: None, original: Object(
{"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }}, validators: [], scopes: {}, default: RefCell { value: None } }}, validators: [<validator>, <validator>], scopes: {}, 
default: RefCell { value: None } }}, supply_defaults: false }, schema: Schema { id: Some(Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(9001), path: "/id/26.j
son", query: None, fragment: None }), schema: None, original: Object({"properties": Object({"created_on": Object({"type": String("integer")}), "creator_service_name": Object({"type": String("string")}), "data": Object({"$ref": Strin
g("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("string")}), "metadata": Object({"type": String("string")}), "name": Object({"type": String("string")})})}), tree: {"properties": Schema { id: None, schema: None,
 original: Object({"created_on": Object({"type": String("integer")}), "creator_service_name": Object({"type": String("string")}), "data": Object({"$ref": String("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("st
ring")}), "metadata": Object({"type": String("string")}), "name": Object({"type": String("string")})}), tree: {"created_on": Schema { id: None, schema: None, original: Object({"type": String("integer")}), tree: {}, validators: [<val
idator>], scopes: {}, default: RefCell { value: None } }, "creator_service_name": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { val
ue: None } }, "data": Schema { id: None, schema: None, original: Object({"$ref": String("ro.esmartbill.id_and_access.user")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "id": Schema { id: N
one, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "metadata": Schema { id: None, schema: None, original: Object({"type": String("str
ing")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "name": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, de
fault: RefCell { value: None } }}, validators: [], scopes: {}, default: RefCell { value: None } }}, validators: [<validator>], scopes: {}, default: RefCell { value: None } } }


---- validation variable is ---- 
ValidationState { errors: [], missing: [], replacement: None }


---- value variable is ----
Object({"name": String("ok")})


---- schema variable is ----
ScopedSchema { scope: Scope { keywords: {"if": KeywordConsumer { keys: ["if", "then", "else"], keyword: <keyword> }, "anyOf": KeywordConsumer { keys: ["anyOf"], keyword: <keyword> }, "maxProperties": KeywordConsumer { keys: ["maxPro
perties"], keyword: <keyword> }, "allOf": KeywordConsumer { keys: ["allOf"], keyword: <keyword> }, "const": KeywordConsumer { keys: ["const"], keyword: <keyword> }, "not": KeywordConsumer { keys: ["not"], keyword: <keyword> }, "form
at": KeywordConsumer { keys: ["format"], keyword: <keyword> }, "items": KeywordConsumer { keys: ["items", "additionalItems"], keyword: <keyword> }, "patternProperties": KeywordConsumer { keys: ["properties", "additionalProperties", 
"patternProperties"], keyword: <keyword> }, "maxItems": KeywordConsumer { keys: ["maxItems"], keyword: <keyword> }, "propertyNames": KeywordConsumer { keys: ["propertyNames"], keyword: <keyword> }, "contentEncoding": KeywordConsumer
 { keys: ["contentMediaType", "contentEncoding"], keyword: <keyword> }, "minLength": KeywordConsumer { keys: ["minLength"], keyword: <keyword> }, "minimum": KeywordConsumer { keys: ["minimum"], keyword: <keyword> }, "exclusiveMinimu
m": KeywordConsumer { keys: ["exclusiveMinimum"], keyword: <keyword> }, "dependencies": KeywordConsumer { keys: ["dependencies"], keyword: <keyword> }, "maxLength": KeywordConsumer { keys: ["maxLength"], keyword: <keyword> }, "requi
red": KeywordConsumer { keys: ["required"], keyword: <keyword> }, "contentMediaType": KeywordConsumer { keys: ["contentMediaType", "contentEncoding"], keyword: <keyword> }, "uniqueItems": KeywordConsumer { keys: ["uniqueItems"], key
word: <keyword> }, "maximum": KeywordConsumer { keys: ["maximum"], keyword: <keyword> }, "minProperties": KeywordConsumer { keys: ["minProperties"], keyword: <keyword> }, "oneOf": KeywordConsumer { keys: ["oneOf"], keyword: <keyword
> }, "additionalItems": KeywordConsumer { keys: ["items", "additionalItems"], keyword: <keyword> }, "exclusiveMaximum": KeywordConsumer { keys: ["exclusiveMaximum"], keyword: <keyword> }, "pattern": KeywordConsumer { keys: ["pattern
"], keyword: <keyword> }, "else": KeywordConsumer { keys: ["if", "then", "else"], keyword: <keyword> }, "then": KeywordConsumer { keys: ["if", "then", "else"], keyword: <keyword> }, "enum": KeywordConsumer { keys: ["enum"], keyword:
 <keyword> }, "minItems": KeywordConsumer { keys: ["minItems"], keyword: <keyword> }, "multipleOf": KeywordConsumer { keys: ["multipleOf"], keyword: <keyword> }, "$ref": KeywordConsumer { keys: ["$ref"], keyword: <keyword> }, "addit
ionalProperties": KeywordConsumer { keys: ["properties", "additionalProperties", "patternProperties"], keyword: <keyword> }, "type": KeywordConsumer { keys: ["type"], keyword: <keyword> }, "contains": KeywordConsumer { keys: ["conta
ins"], keyword: <keyword> }, "properties": KeywordConsumer { keys: ["properties", "additionalProperties", "patternProperties"], keyword: <keyword> }}, schemes: {"http://localhost:9001/ro.esmartbill.id_and_access.user": Schema { id: 
Some(Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(9001), path: "/ro.esmartbill.id_and_access.user", query: None, fragment: None }), schema: None, original: 
Object({"properties": Object({"id": Object({"type": String("string")}), "role_name": Object({"type": String("string")})}), "type": String("object")}), tree: {"properties": Schema { id: None, schema: None, original: Object({"id": Obj
ect({"type": String("string")}), "role_name": Object({"type": String("string")})}), tree: {"id": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default
: RefCell { value: None } }, "role_name": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }}, validators: [], scopes: {
}, default: RefCell { value: None } }}, validators: [<validator>, <validator>], scopes: {}, default: RefCell { value: None } }, "http://localhost:9001/id/26.json": Schema { id: Some(Url { scheme: "http", cannot_be_a_base: false, use
rname: "", password: None, host: Some(Domain("localhost")), port: Some(9001), path: "/id/26.json", query: None, fragment: None }), schema: None, original: Object({"properties": Object({"created_on": Object({"type": String("integer")
}), "creator_service_name": Object({"type": String("string")}), "data": Object({"$ref": String("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("string")}), "metadata": Object({"type": String("string")}), "name": 
Object({"type": String("string")})})}), tree: {"properties": Schema { id: None, schema: None, original: Object({"created_on": Object({"type": String("integer")}), "creator_service_name": Object({"type": String("string")}), "data": O
bject({"$ref": String("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("string")}), "metadata": Object({"type": String("string")}), "name": Object({"type": String("string")})}), tree: {"created_on": Schema { id: N
one, schema: None, original: Object({"type": String("integer")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "creator_service_name": Schema { id: None, schema: None, original: Object({"type"
: String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "data": Schema { id: None, schema: None, original: Object({"$ref": String("ro.esmartbill.id_and_access.user")}), tree: {}, va
lidators: [<validator>], scopes: {}, default: RefCell { value: None } }, "id": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value:
 None } }, "metadata": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "name": Schema { id: None, schema: None, orig
inal: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }}, validators: [], scopes: {}, default: RefCell { value: None } }}, validators: [<validator>], scopes: {}, 
default: RefCell { value: None } }}, supply_defaults: false }, schema: Schema { id: Some(Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(9001), path: "/id/26.j
son", query: None, fragment: None }), schema: None, original: Object({"properties": Object({"created_on": Object({"type": String("integer")}), "creator_service_name": Object({"type": String("string")}), "data": Object({"$ref": Strin
g("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("string")}), "metadata": Object({"type": String("string")}), "name": Object({"type": String("string")})})}), tree: {"properties": Schema { id: None, schema: None,
 original: Object({"created_on": Object({"type": String("integer")}), "creator_service_name": Object({"type": String("string")}), "data": Object({"$ref": String("ro.esmartbill.id_and_access.user")}), "id": Object({"type": String("st
ring")}), "metadata": Object({"type": String("string")}), "name": Object({"type": String("string")})}), tree: {"created_on": Schema { id: None, schema: None, original: Object({"type": String("integer")}), tree: {}, validators: [<val
idator>], scopes: {}, default: RefCell { value: None } }, "creator_service_name": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { val
ue: None } }, "data": Schema { id: None, schema: None, original: Object({"$ref": String("ro.esmartbill.id_and_access.user")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "id": Schema { id: N
one, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "metadata": Schema { id: None, schema: None, original: Object({"type": String("str
ing")}), tree: {}, validators: [<validator>], scopes: {}, default: RefCell { value: None } }, "name": Schema { id: None, schema: None, original: Object({"type": String("string")}), tree: {}, validators: [<validator>], scopes: {}, de
fault: RefCell { value: None } }}, validators: [], scopes: {}, default: RefCell { value: None } }}, validators: [<validator>], scopes: {}, default: RefCell { value: None } } }


---- validation variable is ----
ValidationState { errors: [], missing: [], replacement: None }

I tried to use from valico, valico::json_schema but it seems that it does not validate correctly (maybe I'm doing something wrong), but using dsl json_dsl::Builder is working.

Sorry for not responding earlier.
There is a new release of valico, maybe that works? Since with json, the data should already be in the correct format, and no real conversion is needed, I think It would be nice to have something like a validate for producing data, which just checks the data with the subject and the schema present in schema registry?

Hi @gklijs , Unfortunately I'm working on other projects with other programming languages (not Rust) and I don't know if I will have a project right now to use Rust and the validation schema.
Anyway thank you for the reply.

Not picking this up for 3.0.0 please let me know by commenting on this issue, if you read this and want it added.