ahrefs / atd

Static types for JSON APIs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

using a different ATD annotation tag name for rescript (and mélange?)

tsnobip opened this issue · comments

Today, ocaml and rescript/mélange share the same ATD annotation tag name: <ocaml> when using the -bs option.

But this is becoming quite a pain because we share ATD files quite often between "native" ocaml and rescript and we find ourselves wanting to have different annotations in the two of them, like we often want to have list in ocaml and array in rescript, genType annotations in rescript and deriving in ocaml, etc.

One solution would be to have a different tag name for rescript, something like <rescript> that would accept the same set of parameters as ocaml.

There are two ways to avoid it being a breaking change:

  1. Have a new -res option for atdgen CLI that would ignore <ocaml> tags.
  2. Make <rescript> tags override <ocaml> tags, but that would mean we would need to add missing values for ocaml, like <ocaml repr="list">.

I'd personally be in favor of 1., but I'm all ears for arguments if some of you prefer option 2. I don't know what's mélange maintainers preferences on that, like should they rather stick to <ocaml> tag, have their own, share one with rescript, etc.

Please chime in if you're rescript or mélange users that use ATD (👋 @TheSpyder @anmonteiro @wokalski and others I likely forget)

We discussed it internally. I have no preference. Either approach is good. We want to implement it ASAP so any input is appreciated. If you don't want to see this implemented, this information would be appreciated too.

I occasionally consider forking bs-atdgen-codec-runtime to experiment with making it more ReScript centric, if a ReScript mode is added to atd that’s probably a good time to do it 🤔

  1. Have a new -res option for atdgen CLI that would ignore tags.

Instead of adding a command-line option, I would consider using a global annotation within the ATD file. Command line options should be reserved to deviate from the default behavior for the target language. The default behavior for the target languages should be encoded in the ATD file so that the recipient of an ATD file can build their code with the usual commands.

  1. Make tags override tags, but that would mean we would need to add missing values for ocaml, like .

This is what I would do to keep things simple.

Alternatively, if folks don't like the privileged treatment given to OCaml, we could name all these languages after a common, abstract dialect such as ML. The annotations that apply to all ML dialects would be <ml ...>. <ocaml ...> annotations would then be specific to OCaml and not apply to rescript or mélange. Due to backward compatibility issues and old habits, it may be hard to implement at this point. Or maybe we could do this:

Transitional phase:

  • for target = rescript, have <rescript ...> override <ml ...> and both <rescript ...> and <ml ...> override <ocaml ...>.
  • deprecate <ocaml ...> in favor of <ml ...> unless the author knows that it applies only to OCaml and not most of its dialects. A deprecation warning would be shown on a case-per-case basis, for example when encountering <ocaml repr="list"> but not when encountering <ocaml repr="int31"> (fictitious).

Post-transition:

  • for target = rescript, use <rescript ...> and use <ml ...> as a fallback. Ignore <ocaml ...>.
  • deprecate <ocaml ...> in favor of <ml ...> unless the author knows that it applies only to OCaml and not most of its dialects.

By the way, I'm confused by the relationships between the different OCaml dialects. Is the following language family tree correct?

lang

Source:

digraph TB {
  ocaml [label="OCaml"]
  bucklescript [label="BuckleScript", style=dotted]
  reasonml [label="ReasonML", style=dotted]
  rescript [label="ReScript"]
  melange [label="Melange"]

  ocaml -> reasonml
  ocaml -> bucklescript
  reasonml -> rescript
  bucklescript -> rescript
  rescript -> melange
}

Rendered with dot lang.dot -Tpng -o lang.png.

Close, and I can see how you’d get that impression, but it’s more like:

ocaml -> reasonml
ocaml -> bucklescript
reasonml -> melange
bucklescript -> rescript
bucklescript -> melange

With perhaps a dotted line from reasonml to ReScript. The ReScript syntax is inspired by reasonml but the project is an evolution of BuckleScript. Melange is a fork of BuckleScript from before the rename.

Revised family tree:
lang

digraph TB {
  ocaml [label="OCaml"]
  bucklescript [label="†BuckleScript", style=dotted]
  reasonml [label="†ReasonML", style=dotted]
  rescript [label="ReScript"]
  melange [label="Melange"]

  ocaml -> reasonml
  ocaml -> bucklescript
  reasonml -> melange
  reasonml -> rescript [style=dotted]
  bucklescript -> rescript
  bucklescript -> melange
}

There is a 3rd option I guess, allowing multiple <ocaml> tags with a dialect field.

Like:

type foo <ocaml attr="deriving show"><ocaml dialect="rescript" attr="genType">  = {
  bar: int list <ocaml dialect="rescript" repr="array">;
  baz: int <ocaml dialect="native" repr="int64">;
}

What do you think?

This allows quite some flexibility and is not breaking.

That seems fine, if a little verbose. Wouldn’t that still require a new command line flag to choose the dialect?

I think Paul assumed this would be taken into account with -bs

Yes indeed, this would work with the existing -bs option.

@tsnobip <ocaml dialect="rescript" ...> looks fine to me:

  • easy to understand
  • easy enough to implement
  • no backward-compatibility issues as long as the dialect names don't change :-P
  • a bit verbose but I don't think it would hurt readability

I've spent some time digging the code and thinking about the feature and I've realized that today there's no way today to know what is the target platform when generating the type files (-t option), so we would need some kind of flag in the CLI to provide this information. But this made me realize that this feature could be more useful than just differentiating between rescript and native, there are cases where you want your OCaml representation to be different (even on the same target platform) while using the same json serialization.

For example you'd want the package that produces the json object to have a classic variant for better error messages while the consumer package represents it as a polymorphic variant to compose it with some other polymorphic variant. Same can happen between list and array, etc.

So instead of being just a flag, like atdgen -t -bs, it could be an option like atdgen -t -build-var producer or atdgen -t -build-var res, and this would allow to take into account the <ocaml build-var="producer" repr="classic"> or <ocaml build-var="res" repr="classic"> tags. This would be a more general solution than just differentiating ocaml/rescript builds. Looking at the existing implementation, an even simpler solution would be to use <producer repr="classic"> or <res repr="classic"> customized tags that are less verbose and simpler to implement. Indeed we would just have to add an optional build-var tag to the tags that are defined for the given target with an optional parameter in path_of_target here:

atd/atdgen/src/ocaml.ml

Lines 132 to 138 in 245d302

let path_of_target (target : target) =
match target with
| Default -> [ "ocaml" ]
| Biniou -> [ "ocaml_biniou"; "ocaml" ]
| Json -> [ "ocaml_json"; "ocaml" ]
| Bucklescript -> ["ocaml_bs"; "ocaml"]
| Validate -> [ "ocaml_validate"; "ocaml" ]

And this solution is still not breaking.

What do you guys think?

there are cases where you want your OCaml representation to be different (even on the same target platform) while using the same json serialization

I want you to think about whether this is really necessary and serves us well in the future. I'm worried about the maintenance cost of the project.

My mind is on atdml which would be a rewrite and major simplification of atdgen. What's really important for the future of atd in my opinion is the following:

  1. Ease of maintenance
  2. Support for popular programming languages
  3. Ease of use

an option like atdgen -t -build-var producer or atdgen -t -build-var res, and this would allow to take into account the <ocaml build-var="producer" repr="classic"> or <ocaml build-var="res" repr="classic"> tags.

I like the idea. If we really want the feature, how about tags?

  • The command line could select multiple tags (-tag producer -tag res)
  • The atd annotations could specify multiple tags (<ocaml tag="producer" tag="foo" name="Foo">)
  • An annotation that doesn't specify a tag is assumed to carry all the tags (<ocaml name="Foo"> would be selected by -tag bar but <ocaml tag="foo" name="Foo"> would not be selected by -tag bar`).