unicode-org / message-format-wg

Developing a standard for localizable message strings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

We should not require `*` if the variant keys exhaust all possibilities

eemeli opened this issue · comments

As currently specified, we require at least one variant of each .match message to have only * keys. Otherwise, we consider the message to have a Missing Fallback Variant data model error.

This requirement ensures that no matter the case, selection on a valid message will always produce some explicitly defined pattern.

It is however possible to define selectors such as :plural or :boolean for which the presence of some set of keys guarantees that selection will succeed, such as ['other'] for the former and ['true', 'false'] for the latter.

If we were to make the current error a runtime Selection Error like "No Variant Selected" instead of a Data Model Error, it would allow for more natural messages, such as:

.match {$exists :boolean}
true {{You have the thing.}}
false {{You don't have it.}}

instead of our current

.match {$exists :boolean}
true {{You have the thing.}}
* {{You don't have it.}}

With access to a registry definition, it would still be possible to validate messages for completeness.

An outline of the changes required for this are available in #560 (comment).

I'm assigning this to @stasm, as he mentioned this topic near the end of today's call, and I just wanted to be sure we had an issue filed on this. Please feel free to take over here.

We argued on this one to the death before.
And we ended up with it being mandatory.

There are good reasons to have a default, and we had some kind of agreement on this for months (years?).

With access to a registry definition

We know that not all tools will have access to the registry.

And even if we do, there are the cases where that does not scale (because there are too many values).
Or the values change too often.

It also means that for everything that might have some kind of "catch all" option we will have to define that in the registry (think yes / no / unknown, or masculine / feminine / unknown)
And now we need a way to specify (in the registry) what value is "catch all"
Some tools will not be able to access the registry.
And there is a risk that various selectors will use different values for catch all (unknown, -1, n/a, *)

Then the selection algorithm needs access to the registry, at runtime.
Which is not the case now.
Or if not the registry, then some other way to specify "and by the way, this is the default"

The actual proposal here is related to this line the syntax specification:

At least one variant MUST exist whose keys are all equal to the "catch-all" key *.

This is the only place where the the catch-all is required. The errors part of the specification defines the "Missing Fallback Variant" error, but it is not referenced anywhere else, which is probably an error.

I do think that multiple selectors make this issue more complicated. The current requirement guarantees that all messages with a matcher will produce a pattern to format. Relaxing this requirement would make it possible to have syntactically valid messages that produce a runtime error (currently undefined!) in which the matcher does not match any of the variants.

An argument can be made that the syntactical requirement for * keyed variants might hide other problems with the matcher. To use @eemeli's example, the current requirement means MF might not notice anything wrong with:

.match {$exists :boolean}
ture {{ <-- typo }}
*    {{I don't know if I exist}}

Also, I note that there is nothing necessarily wrong with the following message, except it is inelegant and lacks some parsimony:

.match {$exists :boolean}
true  {{I exist}}
false {{I don't exist}}
*     {{Unreachable?}}

I have a real example where it was not possible to cover all options.

It is about French plurals, which changed a few years ago to include many as a case (CLDR 38, end of 2020, not that long ago).

So we had thousands of strings in many products that had only "one" / "other" for French.
And the times of submission they passed validations (and would have passed them with MF2).

When CLDR added "many", things continued to work, because all of them had the mandatory "other"
Yes, maybe the selection was a bit off at times. But that was the same as before.

It took a while to fix things (and there might still be unfixed cases here and there :-)
And was not trivial (forcing the strings to be re-sent, app updated and released, translation memories updated)

This also impacted outside:


If we assume that we can have a comprehensive validation and remove the mandatory fallback,
these kind of changes can result in crashes, instead of the grammar being a bit off.
Or (at best) fallback to English? Still worse than bad grammar.

If we thing about browsers, which get updated pretty often, we would suddenly break applications that did nothing wrong, they passed all validations (at the time when they were translated), and worked for years.
But now all of a sudden break because the selectors changed "under them"
Some of these applications might be abandoned.
Or some might be maintained, but critical, so having them crash for a few days is really bad.

Not updated as often as browsers, but CLDR is also used by Windows, macos / iOS, Android, others.
These kind of changes can potentially break thousands of applications.


I agree with Addison that "might hide other problems with the matcher"
But "unhiding" it with a crash might is probably worse :-)

If people use proper lint (as this issue assumes), then the situation will not be hidden for long.
At some point someone will run a lint, and will see "case foo is missing in your bar match"
But it is not an exception in the face of the end user.

So let's not fail selection then for plurals, if an other key is present. To repeat myself from above:

It is however possible to define selectors such as :plural or :boolean for which the presence of some set of keys guarantees that selection will succeed, such as ['other'] for the former and ['true', 'false'] for the latter.

In other words, it is perfectly plausible for e.g. a French plural selector to be given a list of keys ['one', 'other'] and to choose other even if many would be a better fit. In this sense, it could work exactly the same as the MF1 plural selector.

Expanding a list of required keys would need to count as a breaking change, which we probably cannot do, for the reasons @mihnita explains above.

One of the previous arguments about this was:

We would prefer not to have to call functions to find out if a message is valid. If we accept this proposal, the only way to know if a set of variants has sufficient closure is for the implementation to call functions or read the registry. Note that "implementation" here can mean tools or even people writing messages. Requiring the * avoids this at the potential cost of requiring the * ... * variant.

I think accepting this change ought to include removing the requirement for the "all fallback" variant from "validity" constraints (putting the onus on authors/tools to avoid "Missing Fallback Variant" errors).

We document briefly why we didn't chose making * optional in this design

I could accept making * optional, on the theory that it makes our syntax less pedantic, but I would prefer to keep this fairly mild guardrail personally: it's easier to developers what to do.


It occurs to me that this is something that could benefit from the beta period. Whichever route we choose now, we should explicitly ask for feedback about the consequences of the choice. If we allow non-* fallbacks, we should ask if people are getting bit by it. If we require * fallbacks, we should ask if people are finding it annoying to be required to write what will seem like an extra variant. Does that make sense?

In the 2024-01-29 call we decided:

  • we will reject this in LDML45
  • we will seek feedback on this item during tech preview (hence label)
  • we will add warning/notes to the spec (new issue created)

Dropping the out-of-scope? label as that was about the LDML54 release.