Effect-TS / schema

Modeling the schema of data structures as first-class values

Home Page:https://effect.website

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot do JSONSchema.from on recursive schema because AST.from strips all annotations

alex2 opened this issue · comments

What version of @effect/schema is running?

0.53.0

What steps can reproduce the bug?

import * as JSONSchema from "@effect/schema/JSONSchema";
import * as S from "@effect/schema/Schema";

interface Category {
  readonly name: string;
  readonly categories: ReadonlyArray<Category>;
}

const schema: S.Schema<Category> = S.struct({
  name: S.string,
  categories: S.array(S.suspend(() => schema).pipe(S.identifier("Category"))),
});

// Same as the example in the README but using .from, rather than .to
const jsonSchema = JSONSchema.from(schema);

What is the expected behavior?

I would expect the schema to be generated since I don't think there's inherently fromish or toish about an identifier annotation.

What do you see instead?

$ bun run ./src/makeSchema.ts
344 |       }
345 |     case "Suspend":
346 |       {
347 |         const identifier = AST.getIdentifierAnnotation(ast);
348 |         if (Option.isNone(identifier)) {
349 |           throw new Error("Generating a JSON Schema for suspended schemas requires an identifier annotation");
                      ^
error: Generating a JSON Schema for suspended schemas requires an identifier annotation
      at go (node_modules/@effect/schema/dist/esm/JSONSchema.js:349:17)
      at goWithMetaData (node_modules/@effect/schema/dist/esm/JSONSchema.js:82:22)
      at map (:1:21)
      at node_modules/effect/dist/esm/Option.js:415:97
      at go (node_modules/@effect/schema/dist/esm/JSONSchema.js:157:22)
      at goWithMetaData (node_modules/@effect/schema/dist/esm/JSONSchema.js:82:22)
      at node_modules/@effect/schema/dist/esm/JSONSchema.js:235:16
      at map (:1:21)
      at go (node_modules/@effect/schema/dist/esm/JSONSchema.js:233:36)
      at goWithMetaData (node_modules/@effect/schema/dist/esm/JSONSchema.js:82:22)
      at goRoot (node_modules/@effect/schema/dist/esm/JSONSchema.js:46:22)
      at src/makeSchema.ts:62:20
error: script "schema" exited with code 1 (SIGHUP)

Additional information

For my use-case, identifier annotations would ideally be preserved when calling AST.from, or, alternatively goRoot would be exported from JSONSchema so that users can pass an AST directly, allowing them to inject the annotations back in on the from side, if there's a reason for omitting them all. Or an allow-list, or perhaps something else.

@alex2 I wonder if it wouldn't be more sensible to avoid relying on the identifier annotation and instead generate identifiers internally solely for the purpose of creating the JSON Schema

That would be neat, but it may still be nice to give users some control over these identifiers, descriptions, etc. Is there a reason you see identifiers - or annotations more generally - as a To-side-only thing? If that’s your thinking, anyway.

Annotations must be considered a To-side-only thing, otherwise you get inconsistent behaviour. Example:

import * as Arbitrary from "@effect/schema/Arbitrary"
import * as Schema from "@effect/schema/Schema"
import * as FastCheck from "fast-check"

const schema = Schema.NumberFromString.pipe(
  Schema.annotations({
    [Arbitrary.ArbitraryHookId]: () => (fc: typeof FastCheck) => fc.float()
  })
)

// const arb: (fc: typeof FastCheck) => FastCheck.Arbitrary<string>
const arb = Arbitrary.from(schema)

console.log(FastCheck.sample(arb(FastCheck), 2)) // should generate strings

If we copied the annotations when calling .from(), we would end up with an Arbitrary that produces numbers instead of strings.

Another option is for the JSONSchema module to use its own from function that respects the identifier annotation, precisely for how it's used in suspend

we would end up with an Arbitrary that produces numbers instead of strings. 

Ah, I assumed there'd be a situation like this but didn't know what it would be, thanks.

Another option is for the JSONSchema module to use its own from function that respects the identifier annotation, precisely for how it's used in suspend

That would certainly work, though are there any instances where the "@effect/schema/annotation/Identifier" symbol  specifically wouldn't apply to both the From and To side? I suppose identities from one side can be re-mapped by any arbitrary transformation to new identities internally. So, alternatively, again thinking more generally, could there be some way to directly annotate the From side?

The Rule of Schemas: Keeping Encode and Decode in Sync

Not exactly what the quote is referring to but by having all annotations in one container rather than two, it seems like more is lost by AST.from/AST.to than perhaps needs to be.

It depends on the semantics we want to give to the identifier annotation. I have always imagined it in this way: it is the name that a "TypeScript" compiler would use to define a const for that schema, like:

const Age = Schema.number.pipe(Schema.identifier("Age"))

// fantasy compiler:
// given a schema returns the code for defining that schema
const typescript = <I, A>(schema: Schema.Schema<I, A>): string => {
  // ...
}

console.log(typescript(Age)) // => "const Age = Schema.number.pipe(Schema.identifier("Age"))"

I see, in that case I don't see a reason to strip it out when running AST.from, though perhaps the addition of a new symbol in the JSONSchema namespace would allow people to more explicitly narrow the semantics?

Somewhat relatedly, is there a reason for having one annotations container rather than two; one for the From side one for the To side, allowing them to be swapped around when calling AST.from/to?

I see, in that case I don't see a reason to strip it out when running AST.from

Yes, if the semantic is "here's a unique identifier for this schema if a compiler were to need it" then we can preserve it when calling from on a schema.

is there a reason for having one annotations container rather than two

Is there really a need for two containers? So far, the need for it has not emerged. It would complicate the situation a lot, and APIs would need to be devised to manage them.

So far, the need for it has not emerged. It would complicate the situation a lot, and APIs would need to be devised to manage them.

Agreed it's a rough trade-off and it would need time and thought and work. Personally I think the complexity is inherent, though. The specific use-case I had in mind was wanting to add a description to a field that is an rfc3339 encoded string on the From side and a Date object on the To side, meaning I'd want to assign different descriptions to them - particularly wanting to add a description to describe the date string encoding.

There are many such cases where I want to invest a lot of time and energy into thoroughly describing and documenting my application boundaries in this way, in the code itself, ideally without writing out the low-level AST code myself.

That said, unlike identifier, I think currently the only way to add a description without also adding a filter is to directly reference the description annotation symbol, so it's a less well established interface.

You can use description:

import * as S from "@effect/schema/Schema"

const schema = S.transform(
  S.string.pipe(S.description("describing From: a rfc3339 string")),
  S.DateFromSelf.pipe(S.description("describing To: a Date")),
  (s) => new Date(s),
  (d) => String(d)
)

Ahh, I understand now. So, anything involving two separate representations to be described would exist as a transformation node with distinctly-annotated from and to nodes, which would then be plucked out by AST.from or AST.to. This also explains why my assumption of an isomorphic to/from is wrong. Thanks for bearing with me!