GSoC: Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects

Question

GSoC: Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects

jviotti opened this issue 5 months ago · comments

Project title

Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects.

Brief Description

The Alterschema project defines a set of JSON-based formal transformation rules for upgrading schemas between Draft 4 and 2020-12, and all dialects in between. These rules are defined using JSON Schema and JSON-e and live within the Alterschema project.

We would like to revise these rules, extend them to support every dialect of JSON Schema (potentially including OpenAPI's old dialects too), and attempt to support some level of downgrading.

Instead of having these rules on the Alterschema repository, we want to have them on the JSON Schema organization for everybody to consume, including Alterschema itself.

Revising the rule format should consider currently unresolved edge cases in Alterschema like tweaking references after a subschema is moved.

Expected Outcomes

A new repository in the JSON Schema organization with upgrade/downgrade rules defined using JSON.

Skills Required

Understanding of various dialects of JSON Schema and their differences.

Mentors

@jviotti

Expected Difficulty

Medium

Expected Time Commitment

350 hours

Benjamin Granados · Answer 1 · Wed Jan 31 2024 18:30:07 GMT+0800 (China Standard Time)

Thanks Juan. This looks amazing!

Suprith KG · Answer 2 · Fri Feb 23 2024 05:07:38 GMT+0800 (China Standard Time)

Hey @jviotti I read through the problem statement, I loved the way the description was put through giving a good understanding. I would love to work on this problem statement under GSOC and the mentors. Can you guide me through more understanding regarding this..😁 and where to start with
And will it be good to read all of the repositories

Juan Cruz Viotti · Answer 3 · Fri Feb 23 2024 23:50:01 GMT+0800 (China Standard Time)

Hey there! I'd first suggest getting acquainted with https://github.com/sourcemeta/alterschema. This is the original project where I prototyped something like what we want to do here, using JSON-e (https://json-e.js.org), but ended up hitting some blockers. You can take a look at all the upgrade transformation rules I support here: https://github.com/sourcemeta/alterschema/tree/master/rules. Try to read them, and understand them mainly in conjunction with JSON Schema's official migration guide: https://json-schema.org/specification#migrating-from-older-drafts.

The way Alterschema work is pretty simple. It will recursively traverse through every subschema of the given schema in a top-down manner, applying all the rules it knows about to every subschema over and over again until no more transformation rules can be executed. The core business logic of it its literally a small JavaScript file: https://github.com/sourcemeta/alterschema/blob/master/bindings/node/index.js

For example, Alterschema rules for upgrading JSON Schema 2019-09 to 2020-12 are defined here: https://github.com/sourcemeta/alterschema/blob/master/rules/jsonschema-2019-09-to-2020-12.json, based on what JSON Schema published here: https://json-schema.org/draft/2020-12/release-notes.

Now, what we would like to do in this GSoC initiative is learn from what we did in Alterschema to do another take on the problem that solves the limitations of Alterschema. The main limitation is this one: sourcemeta/alterschema#43.

In summary, a JSON Schema may reference other parts of itself using URI encoded JSON Pointers along with the $ref and $dynamicRef keywords. The current JSON-e rules that I have on Alterschema will only look at the current subschema and blindly transform it according to what the template says.

However, what happens if there is a reference in another other part the schema that is now invalid after the schema transformation you did somewhere else? If so, we don't have a deterministic way of detecting this, even less know how to "fix up" the reference pointers.

The conclusion I got from this is that JSON-e, while powerful, is too low level and doesn't carry semantics about what the transformation actually did. For example, if you upgrade definitions to $defs, that's a simple rename. Knowing that it is indeed just a simple rename, it's easy to know how to fix any pointers that included /definitions in it.

So what I'm thinking about is that we can study the transformation rules that we want to perform, and break them down into higher level sub transformations. For example, are you completely deleting something? Are we performing just a rename? Are we moving the contents around? If we design a JSON language that works at a higher level of abstraction, we can deterministically know how we should fix any affected pointer.

Juan Cruz Viotti · Answer 4 · Fri Feb 23 2024 23:51:32 GMT+0800 (China Standard Time)

So I'd say the phases in this project are like this:

Research JSON Schema transformation rules, categorize them, etc
Come up with a higher-level transformation language than JSON-e that carry semantics about how we are actually transforming the schema (I was thinking something similar to JSON Patch (https://jsonpatch.com))
Then do a prototype of implementing upgrade rules with this language, ensuring it solves the limitations of Alterschema
If we have more time, we use this language to attempt to level of downgrading support, etc

Juan Cruz Viotti · Answer 5 · Fri Feb 23 2024 23:57:04 GMT+0800 (China Standard Time)

As an initial qualifying task for this project (cc @benjagm), I propose:

Go through every upgrade transformation rules from JSON Schema 2019-09 to 2020-12 in the official upgrade guide (https://json-schema.org/draft/2020-12/release-notes) and on Alterschema (https://github.com/sourcemeta/alterschema/blob/master/rules/jsonschema-2019-09-to-2020-12.json) and categorize them on a spreadsheet/table based on what they are doing. For example, are they simple renames, are they completely moving stuff around? Are they doing something even more complicated? Up to you to figure out how to categorize them
Propose a toy JSON-based DSL transformation language (perhaps inspired by JSON-e and JSON Patch) that encapsulates how to perform these 2019-09 to 2020-12 upgrade rules in a way that you can algorithmically tell how to fix any $ref JSON Pointer that went through the transformed schema
Describe a pseudo-algorithm to fix up $refs

Juan Cruz Viotti · Answer 6 · Sat Feb 24 2024 00:06:56 GMT+0800 (China Standard Time)

As a more specific (though probably a bit artificial and silly 😅) example of the $ref issue, consider the following JSON Schema 2019-09:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "array",
  "items": [
    { "type": "string" },
    { "type": "number" }
  ],
  "additionalItems": { 
    "$ref": "#/items/0" 
  }
}

To turn it into a JSON Schema 2020-12, we need to:

Replace $schema with https://json-schema.org/draft/2020-12/schema
Rename /items to /prefixItems
Rename /additionalItems to /items

However, if you blindly perform these transformations, you would end up with the following schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [
    { "type": "string" },
    { "type": "number" }
  ],
  "items": { 
    "$ref": "#/items/0" 
  }
}

However note that the /items/$ref, which still says #/items/0 is now invalid. We first renamed prefixItems to items, so the $ref should have been updated to #/prefixItems/0 too.

This one is a bit simple, but think about more complex variations of the same problem. You might have long references where many of its components will need to be updated, and in some cases, it will be more than just a component rename.

Juan Cruz Viotti · Answer 7 · Sat Feb 24 2024 00:07:41 GMT+0800 (China Standard Time)

Or if you can think of a better way to deterministically solve this problem, please propose it and we can work on it together!

Vinit Pandit · Answer 8 · Sat Feb 24 2024 00:50:13 GMT+0800 (China Standard Time)

However note that the /items/$ref, which still says #/items/0 is now invalid. We first renamed prefixItems to items, so the $ref should have been updated to #/prefixItems/0 too.

I'm confused by this line. Are we supposed to convert prefixItems to items for the reference to be #/prefixItems/0 as part of the conversion from 2019-09 to 2020-12?

Perhaps you meant items to prefixItems, or maybe I am misunderstanding? 😕

Juan Cruz Viotti · Answer 9 · Sat Feb 24 2024 00:53:18 GMT+0800 (China Standard Time)

@MeastroZI The reference was originally #/items/0, but because we rename items to prefixItems, for the schema to be valid, we should have also adjusted the reference from #/items/0 to #/prefixItems/0. The expected end result should have been this:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [
    { "type": "string" },
    { "type": "number" }
  ],
  "items": { 
    "$ref": "#/prefixItems/0" 
  }
}

Vinit Pandit · Answer 10 · Sat Feb 24 2024 00:57:35 GMT+0800 (China Standard Time)

Hasn't this problem already been addressed with the pattern

"pattern": "/items/\\d+"

"$eval": "replace(schema['$ref'], '/items/(\\d+)', '/prefixItems/$1')"

or is there a possibility that this approach might not cover all cases? If so, could you please specify which cases it might not handle, so I can gain a better understanding of the issue?

Juan Cruz Viotti · Answer 11 · Sat Feb 24 2024 04:19:52 GMT+0800 (China Standard Time)

@MeastroZI For this very trivial rename case yes, but it's very easy to construct valid JSON Schemas where that simple pattern won't do. Take this one as a silly example:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "object",
  "properties": {
    "items": {
      "items": [
        { "type": "string" }
      ]
    },
    "extra": {
      "$ref": "#/properties/items/items/0" 
    }
  }
}

It has an object property called items which is not the actual JSON Schema keyword. In this case, you need to rename only /properties/items/items to /properties/items/prefixItems, and thus only rename the second occurrence of items in the JSON Pointer. In JSON Schema 2019-09, items can also be both a schema or a collection of schemas, so you can have items be a schema that declares items as an array inside and get into a similar situation. You can probably come up with more edge cases around it.

In any case, items to prefixItems is just a simple rename upgrade example. Other JSON Schema keywords may require more than just a simple renaming, making this even harder to resolve for all cases.

Keep in mind that a tool that upgrades schemas must be able to handle ANY valid JSON Schema document that the user passes to it, and handle these tricky edge cases accordingly.

Juan Cruz Viotti · Answer 12 · Sat Feb 24 2024 04:22:35 GMT+0800 (China Standard Time)

For i.e. definitions to $defs in the Alterschema issue I shared is even trickier, because you cannot rely on the next component of items being an integer to improve the pattern like we do for items to prefixItems.

Juan Cruz Viotti · Answer 13 · Sat Feb 24 2024 04:25:25 GMT+0800 (China Standard Time)

Here is a fun one that is valid and breaks the \\d part of the regex:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "object",
  "properties": {
    "foo": {
      "$ref": "#/$defs/items/0" 
    }
  },
  "$defs": {
    "items": {
      "0": {
        "type": "string"
      }
    }
  }
}

Juan Cruz Viotti · Answer 14 · Sat Feb 24 2024 04:29:36 GMT+0800 (China Standard Time)

What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case

Suprith KG · Answer 15 · Sat Feb 24 2024 06:39:18 GMT+0800 (China Standard Time)

What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case
Hi, so instead of handling for every single case for keywords to be transformed.., it is better to make checks based on the semantic hierarchial flow. Am I right? Like chacking whether its an array or object if its only a real item and then casting the 0 to string? Is that what semantics means

Suprith KG · Answer 16 · Sat Feb 24 2024 06:42:05 GMT+0800 (China Standard Time)

Okay ill complete this rn

…

On Sat, 24 Feb 2024 at 1:59 AM, Juan Cruz Viotti ***@***.***> wrote: What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case — Reply to this email directly, view it on GitHub <#599 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASS4PJ5QFZKIGUM3HXQQUOLYVD33ZAVCNFSM6AAAAABCRLXYHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRHE2DSOBRGY> . You are receiving this because you commented.Message ID: ***@***.***>

Juan Cruz Viotti · Answer 17 · Sat Feb 24 2024 09:04:26 GMT+0800 (China Standard Time)

Hi, so instead of handling for every single case for keywords to be transformed.., it is better to make checks based on the semantic hierarchial flow. Am I right? Like chacking whether it's an array or object if it's only a real item and then casting the 0 to string? Is that what semantics means

Not 100% sure what you mean, but what I mean by semantics is being able to statically analyze the actual transformation DSL and actually understand what it does. For example, you cannot very easily tell from a JSON-e template that such template is actually a property rename. And if we can tell that i.e. a rule is actually a rename for A to B, then we might know how to handle the reference fix ups.

Coming back to the items to prefixItems example we've been discussing so far, this is the corresponding JSON-e rule we have in Alterschema:

{
  "$merge": [
    { "$eval": "omit(schema, 'items')" },
    {
      "prefixItems": {
        "$eval": "schema.items"
      }
    }
  ]
}

What if instead of that weird-looking low-level complex JSON template, we instead had:

[
  { "type": "rename", "from": "items", "to": "prefixItems" }
]

The latter is a LOT more machine readable.

I guess the main challenge is that leaving the simple prefixItems rule aside, some upgrade rules are more complex and involve even more cryptic JSON-e templates that do more than just renames. So the problem statement is: can we come up with a set of higher level operations that capture everything we need, AND that is machine readable enough for us to deterministically do $ref fix-ups in every possible case?

Suprith KG · Answer 18 · Sat Feb 24 2024 16:43:58 GMT+0800 (China Standard Time)

So I'd say the phases in this project are like this:

Research JSON Schema transformation rules, categorize them, etc

Come up with a higher-level transformation language than JSON-e that carry semantics about how we are actually transforming the schema (I was thinking something similar to JSON Patch (https://jsonpatch.com))

Then do a prototype of implementing upgrade rules with this language, ensuring it solves the limitations of Alterschema

If we have more time, we use this language to attempt to level of downgrading support, etc

@jviotti one question in this: Should the high level transformation language call the JSON-e at the backend or can say(should the high level one be written on top of JSON-e itself)?

Juan Cruz Viotti · Answer 19 · Sat Feb 24 2024 21:37:40 GMT+0800 (China Standard Time)

@Era-cell Maybe. I'm open to both building it on top of JSON-e or as a standalone thing. Whatever is easier I guess

Benjamin Granados · Answer 20 · Tue Feb 27 2024 19:04:13 GMT+0800 (China Standard Time)

Thanks a lot for joining JSON Schema org for this edition of GSoC!!

Qualification tasks will be published as comments in the project ideas by Thursday/Friday of this week. In addition I'd like to invite you to a office hours session this thursday 18:30 UTC where we'll present the ideas and the relevant date to consider at this stage of the program.

Please use this link to join the session:
🌐 Zoom
📅 20124-02-29 18:30 UTC

See you there!

Juan Cruz Viotti · Answer 21 · Tue Feb 27 2024 21:56:28 GMT+0800 (China Standard Time)

For the qualifying task, just to echo back what I said before: the main thing we want to see on proposals is that you have a good grasp on what the problem of upgrading JSON Schemas is and are capable of understanding the upgrade rules that would need to be implemented.

So for that, you can focus only on 2019-09 to 2020-12 for the proposal (we'll cover other drafts later), list down the transformation rules that need to happen on all those drafts, and try to categorize them based on different criteria to understand them better. For example, what vocabulary they involve, what type of operation they are (rename, wrap, etc), whether they affect other sibling or non sibling keywords, etc. Be creative! Good grouping criteria can surface patterns that we might not be thinking about and that could influence the DSL. You can present this as a spreadsheet, list, or any form you want.

Then, once accepted, we will continue building up on this analysis to design the DSL, and finally implement it. If we did the previous phases well (mainly the one one understanding and categorizing the transformation rules), the rest will be easy

Vinit Pandit · Answer 22 · Thu Feb 29 2024 10:38:21 GMT+0800 (China Standard Time)

{
  "$schema": "https://json-schema.org/draft/2020-12",
  "$id": "https://example.com/anotherthing/agains/customer",

  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "phone": { "$ref": "/schema/common#/$defs/phone" },
    "address": { "$ref": "/schema/address" }
  },

  "$defs": {
    "https://example.com/schema/address": {
      "$id": "https://example.com/schema/address",

      "type": "object",
      "properties": {
        "address": { "type": "string" },
        "city": { "type": "string" },
        "postalCode": { "$ref": "/schema/common#/$defs/usaPostalCode" },
        "state": { "$ref": "#/$defs/states" }
      },

      "$defs": {
        "states": {
          "enum": [4, 4]
        }
      }
    },
    "https://example.com/schema/common": {
      "$schema": "https://json-schema.org/draft/2019-09",
      "$id": "https://example.com/schema/common",

      "$defs": {
        "phone": {
          "type": "number"
        },
        "usaPostalCode": {
          "type": "string",
          "pattern": "^[0-9]{5}(?:-[0-9]{4})?$"
        },
        "unsignedInt": {
          "type": "integer",
          "minimum": 0
        }
      }
    }
  }
}

@jviotti I am not able to understand how, in this case, this $ref under:

"phone": { "$ref": "/schema/common#/$defs/phone" }

which has the relative path, gets resolved by the schema validator. I mean, how is the base URL for this calculated even if there is nothing common in the relative path under $ref and the $id of the root?

Suprith KG · Answer 23 · Thu Feb 29 2024 13:51:37 GMT+0800 (China Standard Time)

```json
{
  "$schema": "https://json-schema.org/draft/2020-12",
  "$id": "https://example.com/anotherthing/agains/customer",

  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "phone": { "$ref": "/schema/common#/$defs/phone" },
    "address": { "$ref": "/schema/address" }
  },

  "$defs": {
    "https://example.com/schema/address": {
      "$id": "https://example.com/schema/address",

      "type": "object",
      "properties": {
        "address": { "type": "string" },
        "city": { "type": "string" },
        "postalCode": { "$ref": "/schema/common#/$defs/usaPostalCode" },
        "state": { "$ref": "#/$defs/states" }
      },

      "$defs": {
        "states": {
          "enum": [4, 4]
        }
      }
    },
    "https://example.com/schema/common": {
      "$schema": "https://json-schema.org/draft/2019-09",
      "$id": "https://example.com/schema/common",

      "$defs": {
        "phone": {
          "type": "number"
        },
        "usaPostalCode": {
          "type": "string",
          "pattern": "^[0-9]{5}(?:-[0-9]{4})?$"
        },
        "unsignedInt": {
          "type": "integer",
          "minimum": 0
        }
      }
    }
  }
}

@jviotti I am not able to understand how, in this case, this $ref under:

"phone": { "$ref": "/schema/common#/$defs/phone" }

which has the relative part, gets resolved by the schema validator. I mean, how is the base URL for this calculated even if there is nothing common in the relative path under $ref and the $id of the root?

Did you try to run it? I am thinking this is related to how schemas are stored

Vinit Pandit · Answer 24 · Thu Feb 29 2024 16:09:08 GMT+0800 (China Standard Time)

@Era-cell, I have read somewhere that $ref is resolved by directly pointing to the schema part they are referring to. So now my question is: how does the schema validator resolve this $ref with a relative path? Even if the schema validator stores these schemas in the definition part or in some other way under the hood , there is still a need to resolve it by referencing it and resolving $ref.

Suprith KG · Answer 25 · Thu Feb 29 2024 16:25:09 GMT+0800 (China Standard Time)

As per documentation: refs are encapsulated from parent schema but defs aren't so annotation results of external achema should effect only validation results. If sub-schema with $ref fails schema is invalidated

…

On Thu, 29 Feb 2024 at 1:39 PM, Vinit Pandit ***@***.***> wrote: @Era-cell <https://github.com/Era-cell>, I have read somewhere that $ref is resolved by directly pointing to the schema part they are referring to. So now my question is: how does the schema validator resolve this $ref with a relative path? Even if the schema validator stores these schemas in the definition part, there is still a need to resolve it by referencing it and resolving $ref. — Reply to this email directly, view it on GitHub <#599 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASS4PJZAM37KOLAJAE3HU73YV3Q3BAVCNFSM6AAAAABCRLXYHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZQGYYTMNJXGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Vinit Pandit · Answer 26 · Thu Feb 29 2024 17:02:01 GMT+0800 (China Standard Time)

The schema I provided is not invalidating; it's working and successfully validating the JSON data.

You can try it here:
https://www.jsonschemavalidator.net/

Edited: Sorry, I am typing from my phone, so may you face typos in my messages

Juan Cruz Viotti · Answer 27 · Thu Feb 29 2024 20:56:42 GMT+0800 (China Standard Time)

@MeastroZI Your reference, /schema/common#/$defs/phone is a URI reference, where /schema/common is the URI path and #/$defs/phone is the URI fragment. Furthermore, that URI reference is relative.

According to JSON Schema use of URI and the URI RFC, that relative URI is resolved taking https://example.com/anotherthing/agains/customer (the $id of the schema resource that contains such reference), as the base URI.

Following standard URI behavior, the result of resolving /schema/common#/$defs/phone against https://example.com/anotherthing/agains/customer results in https://example.com/schema/common#/$defs/phone. Then, when resolving that reference, JSON Schema will look for https://example.com/schema/common, which is an embedded schema resource in the schema you shared, and from then, resolve #/$defs/phone as a JSON Pointer.

If URI behavior is the confusing part, I recommend reading the URI RFC: https://www.rfc-editor.org/rfc/rfc3986

Vinit Pandit · Answer 28 · Fri Mar 01 2024 21:44:09 GMT+0800 (China Standard Time)

const transformRule = [
    {
    referencTraverser: true,
    path: "properties/*",
    conditions: [{ "isKey": "$ref" }],
    refConditions: [{ "isKey": "items", "hasSibling": ["type", "array"] }],
    updateRefPart: "prefixItems"
},
{
    path: '*',
    conditions: [{ "isKey": "items", "hasSibling": ["type", "array"] }],
    operations: {
        "editKey": "prefixItems"
    }
} , 
{
    path : '$schema' ,
  
    operations : {
        "updateValue" : "https://json-schema.org/draft/2020-12/schema"
    }
}
]

const jasonobj = {
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "type": "object",
    "properties": {
        "items": {
            "type": "array",
            "items": [
                { "type": "string" }
            ]
        },
        "extra": {
            "$ref": "#/properties/items/items/0"
        }
    },
    "ooos": {
        "items2": {
            "type": "array",
            "items": []
        },
        "item3": {
            "items4": {
                "items5": {
                    "type": "array",
                    "items": []
                }
            }
        }
    }
}

const result = convert(transformRule, jasonobj)
console.log('\n')
console.log('*******************************Logs*****************************************')
console.log('\n\n\n\n\n\n')
console.log('*******************************Result****************************************')
console.log( JSON.stringify (result , null , 2))
console.log('*******************************Result****************************************')
console.log('\n')

and here is the output

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "prefixItems": [
        {
          "type": "string"
        }
      ]
    },
    "extra": {
      "$ref": "#/properties/items/prefixItems/0"
    }
  },
  "ooos": {
    "items2": {
      "type": "array",
      "prefixItems": []
    },
    "item3": {
      "items4": {
        "items5": {
          "type": "array",
          "prefixItems": []
        }
      }
    }
  }
}

Hi @jviotti, I have a doubt about the meaning of the JSON DSL. Could you please take a look at this code? It's a snippet of my work towards DSL. Actually, I want to know if my code can do something like this. Is it considered as a DSL? If not, how would you technically define a DSL?

And sorry for the previous comment. One more thing I am hesitant about is asking this many questions. Is it okay to ask this many questions or are they silly? I want to openly express my concern about it.

Juan Cruz Viotti · Answer 29 · Fri Mar 01 2024 23:32:59 GMT+0800 (China Standard Time)

@MeastroZI

I have a doubt about the meaning of the JSON DSL. Could you please take a look at this code? It's a snippet of my work towards DSL. Actually, I want to know if my code can do something like this. Is it considered as a DSL? If not, how would you technically define a DSL?

Yeah, exactly, you are thinking about it in the right direction. Your transformRule JSON example is definitely a valid DSL.

And sorry for the previous comment. One more thing I am hesitant about is asking this many questions. Is it okay to ask this many questions or are they silly? I want to openly express my concern about it.

Please ask as many questions as you need. That's the whole point of this phase and I'm sure other people reading this thread would benefit as well. Asking lots of questions is definitely better than not asking them.

Vinit Pandit · Answer 30 · Sun Mar 03 2024 00:07:57 GMT+0800 (China Standard Time)

@jviotti can you explain this

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [{ "type": "string" }, { "type": "string" }],
  "not": {
    "items": {
      "not": { "type": "string", "minLength": 3 }
    }
  },
  "unevaluatedItems": false
}

specially this part

"not": {
    "items": {
      "not": { "type": "string", "minLength": 3 }
    }

My understanding is that it dictates that there must not be any items in the array that are strings with a length less than 3. Therefore, the schema should only accept arrays where all elements have a minimum length of 3. However, it seems to also accept arrays like ["axd", "d"]. Could you clarify this?"

Suprith KG · Answer 31 · Sun Mar 03 2024 01:24:07 GMT+0800 (China Standard Time)

Also the unevaluatedItems behaviour is a bit wierd:

registerSchema({
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_move",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "anyOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^y"
    }
})

for the instance: ["aaa", "ya"]
Shouldn't "^y" go to unevaluatedItems and produce true, why does it give false over here.
In both the examples, the presence of items keyword is making it confusing

Vinit Pandit · Answer 32 · Sun Mar 03 2024 02:15:01 GMT+0800 (China Standard Time)

@Era-cell
unevaluatedItems is only apply to the element in array which is not evaluated but as you use the items before the unevaluatedItems this make all the array element succefully evaluated so there will be no element will left unevaluated thats why your instance is not validating in this , you have to apply the keyword logically there is no meaning to put the unevaluatedItems property if all element is getting evaluated .
🙂
If i am wrong please correct me

Suprith KG · Answer 33 · Sun Mar 03 2024 02:43:04 GMT+0800 (China Standard Time)

@Era-cell unevaluatedItems is only apply to the element in array which is not evaluated but as you use the items before the unevaluatedItems this make all the array element succefully evaluated so there will be no element will left unevaluated thats why your instance is not validating in this , you have to apply the keyword logically there is no meaning to put the unevaluatedItems property if all element is getting evaluated . 🙂 If i am wrong please correct me

But the order of keywords doesnt matter as per the docs, and:
These instance items or properties may have been unsuccessfully evaluated against one or more adjacent keyword subschemas, such as when an assertion in a branch of an "anyOf" fails. Such failed evaluations are not considered to contribute to whether or not the item or property has been evaluated. Only successful evaluations are considered.
-- it says only successful evaluations are consirdered to be evaluated

Vinit Pandit · Answer 34 · Sun Mar 03 2024 02:48:46 GMT+0800 (China Standard Time)

@Era-cell when you make the unevaluateditems to false in your code and then run your instance you will not get the erroe related to unevaluated element , you will get error related to the Items keyword

That means items take care of all the element which is not consider by the prefix element and not let the flow go to the unevaluateditem keyword

Try it here https://json-schema.hyperjump.io/

Vinit Pandit · Answer 35 · Sun Mar 03 2024 02:53:42 GMT+0800 (China Standard Time)

And Even if you remove the unevaluateditems keyword you will get the same error
Guess why !

Same thing bcz items keyword take care of all the element which is not consider by the prefixitems

Suprith KG · Answer 36 · Sun Mar 03 2024 03:16:36 GMT+0800 (China Standard Time)

And Even if you remove the unevaluateditems keyword you will get the same error Guess why !

Same thing bcz items keyword take care of all the element which is not consider by the prefixitems

Yeah, this was my initial thought..
But At this point presence of "items" keyword will not let any of the values to be unevaluated, as per your assumption

Suprith KG · Answer 37 · Sun Mar 03 2024 03:18:17 GMT+0800 (China Standard Time)

@Era-cell unevaluatedItems is only apply to the element in array which is not evaluated but as you use the items before the unevaluatedItems this make all the array element succefully evaluated so there will be no element will left unevaluated thats why your instance is not validating in this , you have to apply the keyword logically there is no meaning to put the unevaluatedItems property if all element is getting evaluated . 🙂 If i am wrong please correct me

But the order of keywords doesnt matter as per the docs, and: These instance items or properties may have been unsuccessfully evaluated against one or more adjacent keyword subschemas, such as when an assertion in a branch of an "anyOf" fails. Such failed evaluations are not considered to contribute to whether or not the item or property has been evaluated. Only successful evaluations are considered. -- it says only successful evaluations are consirdered to be evaluated

Just is it possible to make this statement more clear..?😁

Suprith KG · Answer 38 · Sun Mar 03 2024 03:28:14 GMT+0800 (China Standard Time)

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    },
}

now for ["aaa", "a", "bn", "an"] "an" should be left unevaluated because "a" took care of it,
I expect the result to be true but given false, if even this is evaluated can I get an example where "items" is present and values are unevaluated

Vinit Pandit · Answer 39 · Sun Mar 03 2024 10:24:55 GMT+0800 (China Standard Time)

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    },
}

just tell me one thing is it possible to make the string start with a and simultaneously start with b , so because there is no possible string which is start with a and also start with b that why you are getting error try this

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" }, 
            { "pattern": "b$" }  
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    }
}

on this instance
["aaa" ,"aab" ,"aaab" ]

will give the result true but if you add any string which not start with a and end with b then that element is get catch by the items keyword, as i said earlier items check for all the elements which not consider by the prefixitems , not let the element go toward unevaluatedItems !

Correct me please if i am wrong 😺

Suprith KG · Answer 40 · Mon Mar 04 2024 02:17:10 GMT+0800 (China Standard Time)

@jviotti , I have some more questions in alterschema:
Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these
Why did you opt to choose json-e over javascript functions.. because it was more intuitive?
Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?
Are you going to use alterschema or that will be abandoned?

Juan Cruz Viotti · Answer 41 · Tue Mar 05 2024 02:30:55 GMT+0800 (China Standard Time)

@MeastroZI

@jviotti can you explain this

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [{ "type": "string" }, { "type": "string" }],
  "not": {
    "items": {
      "not": { "type": "string", "minLength": 3 }
    }
  },
  "unevaluatedItems": false
}

My understanding is that it dictates that there must not be any items in the array that are strings with a length less than 3. Therefore, the schema should only accept arrays where all elements have a minimum length of 3. However, it seems to also accept arrays like ["axd", "d"]. Could you clarify this?"

That schema looks overly complicated. Maybe what you want is this instead?

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "items": {
    "minLength": 3
  }
}

Juan Cruz Viotti · Answer 42 · Tue Mar 05 2024 02:32:30 GMT+0800 (China Standard Time)

@Era-cell

Also the unevaluatedItems behaviour is a bit wierd:

The unevaluatedItems behavior depend on other adjacent array-related keywords. As it name implies, unevaluatedItems will only kick-in for array items that have not been evaluated by adjacent array keywords, so the precent of items and prefixItems will indeed affect its behavior

Juan Cruz Viotti · Answer 43 · Tue Mar 05 2024 02:36:34 GMT+0800 (China Standard Time)

@Era-cell

I have some more questions in alterschema:Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these

These perform simplifications within the same version to make it easier to process the other rules. i.e. you could simplify the use of certain keywords on the input schema without changing the version, before you attempt to upgrade it.

Why did you opt to choose json-e over javascript functions.. because it was more intuitive?

The whole point of this project is to make rule definitions programming language agnostic. We don't want to just create an upgrade tool for JavaScript, but one that is embeddable and implementable on ANY language out there. That's why the rules are pure JSON.

Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?

Not sure I follow this. Can you give me an example?

Are you going to use alterschema or that will be abandoned?

I will. The idea is for the JSON-based rules to be moved to the JSON Schema org while Alterschema is (one of many, potentially?) an implementation of the actual engine.

Suprith KG · Answer 44 · Tue Mar 05 2024 02:37:36 GMT+0800 (China Standard Time)

@Era-cell

Also the unevaluatedItems behaviour is a bit wierd:

The unevaluatedItems behavior depend on other adjacent array-related keywords. As it name implies, unevaluatedItems will only kick-in for array items that have not been evaluated by adjacent array keywords, so the precent of items and prefixItems will indeed affect its behavior

@jviotti
My query on this is:
at the presence of items keyword wouldnt the items evaluate each and every instance value, so
-- none of them will be left unevaluated.
(can you give an example even at the presence of "items" keyword there are some unevaluated values left over)

Juan Cruz Viotti · Answer 45 · Tue Mar 05 2024 02:43:24 GMT+0800 (China Standard Time)

at the presence of items keyword wouldnt the items evaluate each and every instance value, so none of them will be left unevaluated.

Correct. Maybe this example helps clarifying that: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2020-12/unevaluatedItems.json#L64-L78

Suprith KG · Answer 46 · Tue Mar 05 2024 02:43:32 GMT+0800 (China Standard Time)

@Era-cell

I have some more questions in alterschema:Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these

These perform simplifications within the same version to make it easier to process the other rules. i.e. you could simplify the use of certain keywords on the input schema without changing the version, before you attempt to upgrade it.

Why did you opt to choose json-e over javascript functions.. because it was more intuitive?

The whole point of this project is to make rule definitions programming language agnostic. We don't want to just create an upgrade tool for JavaScript, but one that is embeddable and implementable on ANY language out there. That's why the rules are pure JSON.

Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?

Not sure I follow this. Can you give me an example?

Are you going to use alterschema or that will be abandoned?

I will. The idea is for the JSON-based rules to be moved to the JSON Schema org while Alterschema is (one of many, potentially?) an implementation of the actual engine.

like do we need to use parsers, lexifiers and new grammar defining the language, OR use abstraction over the json-e or javascript(or any other language to create functions with arguments) itself..?

Juan Cruz Viotti · Answer 47 · Tue Mar 05 2024 02:44:59 GMT+0800 (China Standard Time)

@Era-cell

like do we need to use parsers, lexifiers and new grammar defining the language, OR use abstraction over the json-e or javascript(or any other language to create functions with arguments) itself..?

It should be all JSON based. No need for a new grammar. Just use JSON's grammar. But don't embed an actual programming language like JavaScript on the JSON. JSON-e is one valid way of doing it. It expresses the transformations purely using JSON.

Suprith KG · Answer 48 · Sat Mar 09 2024 14:20:34 GMT+0800 (China Standard Time)

Hi, @jviotti when the algorithm/DSL will be included in JSON Schema org, will the access to external json schema documents be provided,

"$ref":"other.json#/$defs/items/0"

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

Juan Cruz Viotti · Answer 49 · Mon Mar 11 2024 20:59:26 GMT+0800 (China Standard Time)

Hi @Era-cell

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

Great question! Yes on both cases:

A JSON Schema is allowed to externally reference another JSON Schema that makes use of a different draft. i.e. you can have a JSON Schema 2020-12 that externally references a JSON Schema Draft 4. So in that case, it is not really required to i.e. upgrade the other schema and we can simply ignore it if we don't have access to it
That said, while this cross-version referencing is supposed to work, I think many implementations out there don't properly support it, and the JSON Schema test suite doesn't cover it either. For these cases, what you can do is perform JSON Schema Bundling (https://json-schema.org/blog/posts/bundling-json-schema-compound-documents) before upgrading that schema. Bundling will bring in all externally referenced schema into a single schema with nested schema resources, and then we upgrade them all together

But in both cases, our upgrader shouldn't really mind. If its passed a schema with unresolved remote references, it will do what it can, and if its passed a bundled schema, it will transform the entire thing.

Vinit Pandit · Answer 50 · Mon Mar 11 2024 21:49:13 GMT+0800 (China Standard Time)

"Hi, @jviotti! I have one more question about bundling schemas. Can I assume that the name(key) of the schema in $def will always be an $id of that schema, or it can be anything? For example, in this schema under the $def, the names are set to the $id of the schema:"

{
  "$id": "https://jsonschema.dev/schemas/examples/non-negative-integer-bundle",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "Must be a non-negative integer",
  "$comment": "A JSON Schema Compound Document. Aka a bundled schema.",
  "$defs": {
    "https://jsonschema.dev/schemas/mixins/integer": {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://jsonschema.dev/schemas/mixins/integer",
      "description": "Must be an integer",
      "type": "integer"
    },
    "https://jsonschema.dev/schemas/mixins/non-negative": {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://jsonschema.dev/schemas/mixins/non-negative",
      "description": "Not allowed to be negative",
      "minimum": 0
    },
    "nonNegativeInteger": {
      "allOf": [
        {
          "$ref": "/schemas/mixins/integer"
        },
        {
          "$ref": "/schemas/mixins/non-negative"
        }
      ]
    }
  },
  "$ref": "#/$defs/nonNegativeInteger"
}

Matthew Adams · Answer 51 · Mon Mar 11 2024 21:56:51 GMT+0800 (China Standard Time)

It can be anything.

Matthew Adams · Answer 52 · Mon Mar 11 2024 21:59:29 GMT+0800 (China Standard Time)

(The value of the $ref is applied to the current scope and the schema is resolved from that reference.)

Suprith KG · Answer 53 · Tue Mar 12 2024 02:32:00 GMT+0800 (China Standard Time)

Hi @Era-cell

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

Great question! Yes on both cases:

A JSON Schema is allowed to externally reference another JSON Schema that makes use of a different draft. i.e. you can have a JSON Schema 2020-12 that externally references a JSON Schema Draft 4. So in that case, it is not really required to i.e. upgrade the other schema and we can simply ignore it if we don't have access to it

That said, while this cross-version referencing is supposed to work, I think many implementations out there don't properly support it, and the JSON Schema test suite doesn't cover it either. For these cases, what you can do is perform JSON Schema Bundling (https://json-schema.org/blog/posts/bundling-json-schema-compound-documents) before upgrading that schema. Bundling will bring in all externally referenced schema into a single schema with nested schema resources, and then we upgrade them all together

But in both cases, our upgrader shouldn't really mind. If its passed a schema with unresolved remote references, it will do what it can, and if its passed a bundled schema, it will transform the entire thing.

Okay, so if we have access to external resource and it is resolved.. we dont change the external schema,
but we bundle it in the present document itself right?
BECAUSE the user may use the external schema for other purposes too.. Right?

Juan Cruz Viotti · Answer 54 · Tue Mar 12 2024 04:30:36 GMT+0800 (China Standard Time)

Keep in mind the project would not be able to "modify" any schema in place. What it does is create a copy of the input schema with the given transformations. So:

If the schema is bundled, you transform the entire thing, including the bundled resources
If the schema is NOT bundled, you just transform the immediate schema only

Benjamin Granados · Answer 55 · Mon Mar 18 2024 17:37:19 GMT+0800 (China Standard Time)

🚩 IMPORTANT INSTRUCTIONS REGARDING HOW AND WHERE TO SUBMIT YOU APPLICATION 🚩

Please join this discussion in JSON Schema slack to get the last details very important details on how to better submit your application to JSON Schema.

See communication here.

Suprith KG · Answer 56 · Tue Mar 19 2024 02:02:05 GMT+0800 (China Standard Time)

Hi, @jviotti where should the qualification task be submitted, and what is the deadline for it?

Juan Cruz Viotti · Answer 57 · Tue Mar 19 2024 03:07:58 GMT+0800 (China Standard Time)

@Era-cell I believe there is a GSoC portal that you should use. @benjagm Can you clarify?

Suprith KG · Answer 58 · Tue Mar 19 2024 04:31:36 GMT+0800 (China Standard Time)

@Era-cell I believe there is a GSoC portal that you should use. @benjagm Can you clarify?

@jviotti I guess that is for the proposal, should I embed qualification task inside proposal itself..?
@benjagm

Benjamin Granados · Answer 59 · Tue Mar 19 2024 05:57:26 GMT+0800 (China Standard Time)

@Era-cell yes please. Make sure you add the details of the qualification task to the proposal! Feel free to join the #gsoc channel in our Slack workspace to get immediately response to these type of questions

Vinit Pandit · Answer 60 · Sun Mar 24 2024 01:46:31 GMT+0800 (China Standard Time)

Hi @jviotti,

First of all, I apologize for using the Alterschima UI to display my DSL transformation. It's only temporary!

Could you please review the transformation from 2019 to 2020 draft on this site? I've embedded the qualification tasks' DSL transformation code and have tried my best to cover all edge cases. However, if I've missed any, please let me know."

Juan Cruz Viotti · Answer 61 · Sun Mar 24 2024 02:25:17 GMT+0800 (China Standard Time)

@MeastroZI Not much I can comment on given a single example, but looking forward to the explanations, proposed rules, etc in the proposal!

Vinit Pandit · Answer 62 · Mon Mar 25 2024 22:36:07 GMT+0800 (China Standard Time)

@jviotti, I submitted my proposal (Name: Pandit Vinit ) in Json schema. Could you please review it and provide any suggestions if possible ?

Juan Cruz Viotti · Answer 63 · Wed Mar 27 2024 06:39:09 GMT+0800 (China Standard Time)

I will, thanks a lot for the submission! ❤️

Vinit Pandit · Answer 64 · Thu Mar 28 2024 14:23:40 GMT+0800 (China Standard Time)

@jviotti in 2019-09 draft i am not able to find the any difference between additionalItems and unevaluatedItems
here written as "Similar to additionalItems, but can "see" into subschemas and across references" but as i tested this schema , additionalItems also doing all of this
here is the example

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$def": {
    "stringArray": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "numberArray": {
      "oneOf": [
        {
          "type": "array",
          "items": [
            {
              "type": "number"
            },
            {
              "$ref": "#/$def/stringArray"
            }
          ]
        },
        {
          "type": "boolean"
        }
      ]
    }
  },
  "type": "array",
  "items": [
    {
      "$ref": "#/$def/stringArray"
    }
  ],
  "additionalItems": {
    "$ref": "#/$def/numberArray"
  }
}

validate against : [[""] , [5 , [""]] ] and [[""] , true ]

so my question is what is the difference between additionalItems and unevaluatedItems in 2019-09 draft and is there any example schema which show the difference between additionalItems and unevaluatedItems ?

Juan Cruz Viotti · Answer 65 · Fri Mar 29 2024 02:32:09 GMT+0800 (China Standard Time)

@MeastroZI Take a look at the official test suite examples: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2019-09/unevaluatedItems.json. additionalItems matches any array element not covered by an adjacent items. unevaluatedItems applies to array items that were not evaluated (as its name implies) by any other relevant keyword (whether adjacent or not).

Vinit Pandit · Answer 66 · Thu Apr 04 2024 21:46:42 GMT+0800 (China Standard Time)

"@jviotti, I need direction to think on how to approach downgrading of JSON schema. Is it even possible to do this for all the dialects? With each new version, new keywords are introduced, and I'm unsure if it's feasible to replicate their behavior using the previous version.

Regarding upgrading, I've developed the DSL, and I believe it's capable of handling all upgrades. Please review the recent changes I made in the repository and please provide feedback if possible."

Juan Cruz Viotti · Answer 67 · Fri Apr 05 2024 04:21:53 GMT+0800 (China Standard Time)

@MeastroZI It is not always feasible, but I think you can go a long way with it, and we can think how to handle the problematic cases. I think if the resulting downgraded schema is a superset of the schema (i.e. it doesn't add more constraints), then it's probably acceptable.

github-actions · Answer 68 · Sun Jun 16 2024 08:55:37 GMT+0800 (China Standard Time)

Hello! 👋

This issue has been automatically marked as stale due to inactivity 😴

It will be closed in 180 days if no further activity occurs. To keep it active, please add a comment with more details.

There can be many reasons why a specific issue has no activity. The most probable cause is a lack of time, not a lack of interest.

Let us figure out together how to push this issue forward. Connect with us through our slack channel : https://json-schema.org/slack

Thank you for your patience ❤️