json-schema-org / json-schema-spec

The JSON Schema specification

Home Page:http://json-schema.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

We should all agree what constitutes an "implementation"

Relequestual opened this issue · comments

We should all agree what constitutes an "implementation". Let's discuss that topic at the next OCWM.

Originally posted by @jdesrosiers in #1435 (comment)

Specifically, what does it mean to be an "implementation of JSON Schema"?

The JSON Hyper Schema repo states that it is a JSON Schema vocabulary that allows for the annotation of JSON with links. Hypermedia is cool, and I we have a chance to make something easier for people to understand and use than other hypermedia solutions like JSON-LD. (I think JSON:API does an OK job, but it could be far better).

We could argue that we no longer agree with that statement, but it seems accurate to me.

Unlike other JSON Schema vocabularies we've seen, JSON Hyper Schema carves out space for itself in hard to understand space.

Strategically, there are some good arguments for not calling a JSON Hyper Schema client "just" a JSON Schema implementation.

Benefits include:

  • Making it clear that there's a distinct purpose to JSON Hyper Schema: Hypermedia controls
  • Focus on Hypermedia could help avoid confusion and misunderstandings
  • Primerily showcase power beyond validation: link traversal, action execution, and state management

These sound similar but are subtly different.

A JSON Hyper Schema implementation does not have to provide any validation functionality, and may in stead rely on a third party JSON Schema implementation to provide annotation results. This could even be pluggable.

This also allows a JSON Hyper Schema librarly to specify a default context specific configuration for any validation process, to a JSON Schema implementation.

When we talk about a JSON Schema implementation, we useually mean a "JSON Schema [validaiton]" implementation. Validation is implied.

Overall, I think there are several benefits in terms of positioning and flexibility to not define JSON Hyper Schema implementations as JSON Schema implementations.

I'd very much like to hear more opinions and thoughts on this, and I think we plan to discuss at our next OCWM. These are my primer notes/thoughts while it's in my mind.

To be clear, the question of "what is an implementation", is a proxy for the real issue, "what is the scope of the spec", specifically the "core" spec. If we say that only validation/annotation implementations are in scope, then anything that doesn't involve validation and/or annotation is not beholden to anything the spec says and I don't think that's what we want.

Code-gen is a good example. There's no validation or annotation involved. (If you're thinking code-gen could use annotations, remember that annotation requires an instance which is not present in code-gen.) If we limit the scope of the spec to only validation/annotation implementations, someone could define a code-gen spec that defines a different behavior for $ref because code-gen isn't a validation/annotation implementation and therefore they aren't bound by the definition in the spec. Even if they don't redefine anything, they couldn't claim any relation to JSON Schema because the spec explicitly excludes that type of implementation. That would be very awkward because they're obviously using JSON Schema documents and structures in those documents are expected to have the same semantics as they do in the spec.

My understanding and expectation is that the "core" spec defines all the common elements that anything that wants to be part of the JSON Schema family needs to adhere to. That includes validation, annotation, hyper-schema, {whatever}-gen, and more. Whether or not we want to call non-validation/annotation implementations "JSON Schema Implementations", I think the scope of the "core" spec needs to include anything that does some unique evaluation of a JSON Schema document as we've defined a JSON Schema document. If we want/need to say something specifically about validation implementations, that should go in the "validation" spec. We have two specs for that reason.

I think Jason and I are drawing orthogonal lines (as is our way).

Jason is looking at Core (which is what a JSON Schema document is and how it should be processed) against validation, annotation, generation, and all of the other use cases.

I'm looking at the blurry line between "implementation" and "application." ("Application" meaning "how the implementation is applied," not necessarily an executable.)

Architecturally speaking, an implementation of a specification is the embodiment of the requirements of that specification. Any functionality outside of those requirements is not strictly part of the implementation; rather, that extra functionality uses an implementation.

Practically speaking, that embodiment may be composed directly into an application so that they are indistinguishable. Is it right to call such a tool an implementation, or is the implementation merely an integrated part of the tool?

As an illustration, are either of these an implementation?

  • a CLI validator where all of the JSON Schema logic is built into the application
  • PowerShell's Test-Json cmdlet which internally uses JsonSchema.Net

I can't say, TBH. Architecturally, neither are strictly implementations, but that they both use implementations. It's just that one implementation is separable from the application.

This is how I view a "hyper-schema client." I see it as two parts:

  • hyper-schema processing, which does the annotation and processing of JSON documents
  • a web client, which uses the hyper-schema processing logic

I would consider the first to be an implementation, but not the second.

@gregsdennis What do you think about the code-gen case? There's no validation/annotation component backing that implementation, but it still needs to understand and process $id, $ref, etc in a way that's consistent with the core spec. Does the "core" spec apply to a code-gen implementation? If not shouldn't there be some spec that we produce that applies to common constructs that are used by all implementations including those that don't involve validation/annotation?

My architect brain likes to draw boxes, and it draws a box for Core. With that mindset, even validation is an extension of Core (even though, pedantically, the Core meta-schema requires Validation and others to work).

This is why I don't know where to draw the line.

I have Core and all of the Core/Validation vocabs implemented in a primary library. Is that in its entirety an implementation? Maybe it's two implementations: one for Core and one for Validation. I put it all in one library because it seemed to me that validation would be the most common use case. I suppose technically I could have one library for just Core and one for validation stuff, but then Core by itself doesn't seem useful to me.

I then have separate libraries for schema, code, and data generation. They don't use the validation logic, but they do use the validation definition. That is, they consider the keywords that the Validation spec defines.

even validation is an extension of Core

I think of this the same way. Lot's of different things can also extend "core". I think of anything that implements "core" (or just "core" by itself) to be a JSON Schema implementation. Anything that implements "validation" is a specific type of JSON Schema implementation, a validation implementation.

the Core meta-schema requires Validation and others to work

I think there's definitely some overlap and we should work toward decoupling these. The dependency lines should only go one direction.

Core by itself doesn't seem useful to me

Core could be used as a generic data format. Imagine you have an API and it would be really convenient and powerful if your data could use references just like your schemas do. You could use the JSON Schema media type with just the core vocabulary. That's really all I think the "core" spec should be. The rest is just coupling it to validation/annotation which is unnecessary and limiting.

The dependency lines should only go one direction.

Perhaps a separate conversation, but would you consider the Core meta-schema needing Validation (technically, it's validating core keywords are used properly) to be crossing dependencies? I don't think it is. The Core vocab itself is still independent.


So it seems we agree on the axis of Core vs how it's used that "how it's used" can be considered an implementation. There's a nuance between something being an implementation architecturally vs practically. I think I'm okay with (loosely) defining implementations as "first degree" (think Kevin Bacon) usages of JSON Schema. That is, they use Core to do a thing, like validate, annotate, generate, etc.

I'd like to explore the "second degree" usages of JSON Schema. These are consumers of the "first degree" implementations. So this would include things like

  • a CLI that houses validation functionality
  • a client that uses hyper-schema annotations to coordinate web requests

As I mentioned before, the tricky part here is that many times, these don't merely use an implementation, but they have it built in. Architecturally, you can identify the different parts, but practically, it's all a single package/executable. Do we call these implementations?

If VSCode builds in all of its JSON Schema support instead of using someone else's library (Microsoft does like to re-invent the wheel), does that mean that VSCode is an implementation?

would you consider the Core meta-schema needing Validation (technically, it's validating core keywords are used properly) to be crossing dependencies?

No, not at all. If we described the core vocabulary using CUE, it wouldn't mean we have a dependency on CUE. Similarly, I don't think that the core vocabulary meta-schema being described using "validation" means the "core" spec is coupled to "validation".

The main thing I have in mind about core depending on validation is how references are only allowed where a schema is expected and core keywords in a location that is not semantically a schema are ignored. In order to make these distinctions, a "core" implementation needs to know details from the "vocabulary" spec. In fact, it needs to know the details of any third party vocabulary that it wants to support.

I implement core as a separate component from validation with strictly one way dependencies, which is why I fail tests for ignoring core keywords in places that aren't schemas. Because that component doesn't know about other vocabularies, I have to assume that any time I see those core keywords that they belong there and I should processes them as a schema.

There's a nuance between something being an implementation architecturally vs practically.

I'm in the "practical" camp. I would call anything that exposes some kind of JSON Schema functionality an implementation. However, I recognize that that does leave a gap in our vocabulary. For example, PowerShell both is an implementation and uses an implementation. I'm not sure what words I would use to distinguish between the two truths.

Would you say that PowerShell's functionality is bound by the requirements of JSON Schema? Specifically, they configure JsonSchema.Net to fetch references from the network / file system if needed. Or are they merely using JSON Schema to perform (very) similar functionality?

What about VSCode's linting & intellisense functionality (whether native or via an extension)?

Yes, I think PowerShell is definitely bound by the requirements of JSON Schema.

For VSCode, I would say that anything that as to do only with linting and autocomplete is undefined for a JSON Schema spec point of view, but any features it has that are defined in JSON Schema should be respected, especially core stuff like identification and referencing.

Okay, so what do we do with this? Does it just go in our online glossary? Does it need to be defined in the spec?

I don't know. A glossary entry seems reasonable.

@jdesrosiers Would you like to take on making a PR for the glossary entry?
If so, please self assign this Issue =]

I think an "implementation" is a library or release which contains code itself, not provided by a third party, which uses does anything defined in the specification.

I see the "secondary" class as mostly "wrappers" or "augmentors".

Someone up to add the definition to the glossary?