unicode-org / message-format-wg

Developing a standard for localizable message strings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extending the registry is not defined and difficult

eemeli opened this issue · comments

With our current registry definition, each <function> may contain multiple <formatSignature> and <matchSignature> elements, each of which separately defines their inputs and options. I think this structure allows too much variation and is difficult to extend.

For context, let us consider a registry that extends our base registry by adding an option icu:skeleton to the :datetime formatter. What does that registry look like, and what changes are required to make that possible and easy to use?


First, this child registry needs some way of indicating which parent registry it's extending. One possibility is a new attribute with a URL value:

<registry extends="https://unicode.org/path/to/messageformat/registry.xml">

Are there other alternatives? Is there some more natively XML-ish way to express this sort of parent-child relationship?


Next, we need a way to identify the parent registry's formatter definition to which we're adding the option. At the moment, as we allow for and use more than one <formatSignature> in <function name="datetime">, this is difficult. I can think of a couple of different solutions:

  1. Add a required unique id on each <formatSignature> and use that to link the parent and child signatures.
  2. Do not support having more than one <formatSignature> and <matchSignature> in each <function>.
  3. Drop the intermediate <formatSignature> and <matchSignature> elements, and indicate otherwise whether the function supports formatting and matching, and which options affect either.

Are there other possibilities that could be considered? One additional required change here changing the <function name> to be a ID rather than an NMTOKEN.

My own preference would be the third one, because that would also solve another potential problem: Currently, we allow for each function option to have completely independent and different meanings in each <formatSignature> and <matchSignature>. So the permitted values for an option foo can change entirely depending on whether another option bar is set, and where in the message it's used. This is unnecessarily liberal.

Alternatively, we could take the approach used by <alias>, and add a supports=format|match|all attribute to <function> and <option>, with the latter being defined directly under <function>. This way, each option would only have a single definition that's independent of its position in the message.

If we choose either the second or third option, we should consider adding an attribute like conflicts or excludes on <option> to allow for the definition of mutually incompatible baskets of options.

To sketch out this possibility in more detail, here's what the registry.xml could look like if the changes from #560 are also incorporated: https://github.com/eemeli/message-format-wg/blob/extend-registry/spec/registry.xml


Finally, we need a definition of how the merge happens. Do we only support extending the whole <registry>, or also individual <function> and other elements? Do we support localization of the descriptions as an extension, or should that require a manual copy?

I think a few other things to call out.

An implementation will probably have its own registry copy, with whatever is has chosen to locally extend included. For example, the icu:skeleton add-on might be in ICU4J's "default registry" directly.

A plug-in library, by contrast, might want to be somewhat implementation agnostic. It's implementer might want it to be entirely self-contained, so its registry file would contain complete descriptions of the selectors and formatters it provides (even if these override the default ones in an implementation). This makes sense because the code being called is separate from that of the implementation. Presumably the plug-in masks any conflicts, at least within it's namespace.

In practice, each implementation will define calling conventions and APIs necessary for this kind of behavior. We might not want to solve this for 2.0 and leave some room for implementation experience to inform future standardization. The purpose of the default registry in 2.0 is to set a floor for selectors and formatters functionality and to help ensure message interoperability between implementations. We'll probably want to push implementations to adopt more functions in a standardized way (rather than plugging-in separate different formatters for the same things).

The purpose of the default registry in 2.0 is to set a floor for selectors and formatters functionality and to help ensure message interoperability between implementations.

To expand on that somewhat, an important function of the registry is to communicate to non-formatter-implementation tools how the functions used in MF2 work. We want to make it easy for someone using implementation A together with plugins B and C to have their tools (validation/linting, XLIFF target file template generator, others) understand the relevant parts of how the functions of A, B, and C all work, and which ones override which others.

Fair enough. We should write this into registry.md. We need to make a determination for what is normative. I suspect that we would accept an implementation that didn't provide a registry as conformant on some level. Perhaps this is a different level of compliance? I note that our goals are open to interpretation as to whether this is required for a release. We should seek consensus on that.

Thanks for starting this discussion, it's useful and timely.

Next, we need a way to identify the parent registry's formatter definition to which we're adding the option. At the moment, as we allow for and use more than one <formatSignature> in <function name="datetime">, this is difficult.

I tend to think the opposite: that multiple signatures make it easy to extend the registry.

<!-- The parent registry. -->

<function name="datetime">
  <!-- The root signature with common options. -->
  <formatSignature>
    <option name="dateStyle" values="full long medium short">
      <description>The predefined date formatting style to use.</description>
    </option>
    <option name="timeStyle" values="full long medium short">
      <description>The predefined time formatting style to use.</description>
    </option>
    <option name="calendar" values="buddhist chinese ...">
      <description>The calendar system to use.</description>
    </option>
  </formatSignature>
</function>
<!-- The child registry. -->

<function name="datetime">
  <!-- The extension. -->
  <formatSignature>
    <!-- Options without details inherit from the root signature. -->
    <!-- Options not repeated here are not available in this signature. -->
    <option name="calendar"/>

    <!-- Extend the signature with a custom option. -->
    <option name="icu:skeleton" validationRule="...">
      <description>...</description>
    </option>
  </formatSignature>
</function>

Finally, we need a definition of how the merge happens. Do we only support extending the whole , or also individual and other elements? Do we support localization of the descriptions as an extension, or should that require a manual copy?

I think using some kind of specificity should be good enough, although I don't have the exact algorithm ready in my head. The two use-cases that I'd like to keep in mind are:

  • I'd like to add a new custom option and allow it only with certain other builtin options.
  • I'd like to add options for a particular locale and allow it together with other builtin options.

@stasm Given that our current registry.xml includes two mutually exclusive <formatSignature> elements for :datetime, could you clarify which of these your example child registry's icu:skeleton option extends? From context I presume that it must be the second one, given that the first one doesn't support anything beyond the high-level timeStyle and dateStyle, but I don't see how that's reflected in your example.

Edit: Or, wait, does the <formatSignature> with icu:skeleton introduce a third mutually exclusive option bag for :datetime? So with this approach if I wanted to "extend" a <formatSignature> I would need to copy it from the parent, and add another <option> to it, yes? Let's say I do that with eraDisplay, and then the parent registry is updated adding a new calendar value. Do I understand right that the calendar added to the parent won't be valid together with eraDisplay until I manually apply that update to the copied <formatSignature>?

@stasm suggested that each function should have a single "root signature" and then 0 or more formatSignatures / matchSignatures that extend it, but as @eemeli said, that isn't currently required, since :datetime has two formatSignatures and neither is the "root".

But the signatures for :datetime could be refactored so that there's a single root signature: for example, the calendar, numberingSystem, timeZone, and hourCycle options appear in both of the signatures.

So maybe this would be something to add a design doc for?

Per our last call (2024-02-05) this is non-blocking.