unicode-org / message-format-wg

Developing a standard for localizable message strings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider forbidding pass-through `.local`

eemeli opened this issue · comments

So here’s an interesting detail that I needed to puzzle through while working on the JS Intl.MessageFormat spec text:

In a message like

.local $foo = {$bar}
{{{$foo :func} and {$bar :func}}}

we do not guarantee that :func will be called twice with the same inputs, effectively because of this line of the Expression Resolution part of the spec:

An implementation MAY perform additional processing when resolving the value of the expression.

In other words, what looks like a no-op assignment isn't, because we allow for implementations to effectively apply any transforms to expressions that don't have an annotation.

Given that we do have .input, is there a use case for allowing just {$bar} as the .local value? I do see the value in allowing for a bare literal value to be used like .local $foo = {some-keyword}, but even then requiring something like :number or :string on it doesn't seem like it would be too much to ask.

The benefit of this change would be to pre-empt a potential source of confusion, as it's not clear to a reader that the $foo and $bar values above can be of completely different types.

This disambiguation isn't needed for .input, as it is establishing a single definition within the message for its variable reference, so .input {$foo} should still be allowed.

The expression {$bar} is being assigned to $foo. This should not be taken to imply that anything happened to $bar. Any additional transforms happened to the expression {$bar}, not to the input variable itself.

Consider this message:

.local $day = {$today :date dateStyle=short}
.local $time = {$today :time timeStyle=short}
{{{$today :datetime dateStyle=long timeStyle=long} is {$day} at {$time}}}

The assignments in .local don't do anything to $today, just to the expressions.

"But @aphillips," you'll say, "your expressions are not just plain assignments."

That's true, but think of the function as the "additional processing" in the spec. If you had .local $day = {$today}, we permit the implementation to evaluate $today (maybe introspect that it is a date object) when making the value $day, but not to affect what $today does. But we have .local in part because our variables are quasi-immutable.

The expression {$bar} is being assigned to $foo. This should not be taken to imply that anything happened to $bar. Any additional transforms happened to the expression {$bar}, not to the input variable itself.

Yes, agreed. But my point is that it's not obvious that $foo is not necessarily the same as $bar, even though it looks like the value of the latter has just been assigned to the former.

So with this version of your example message:

.local $day = {$today}
{{{$today :datetime dateStyle=long timeStyle=long} is {$day :datetime dateStyle=short}}}

If the value of $today is a number (Unix seconds), the first :datetime call would likely get that primitive number as its input, but the second call could get either:

  1. Some sort of object wrapping the number, as it may have been first processed as if the declaration expression had been {$today :number}, or
  2. The primitive number.

Especially in custom function implementations, this could lead to surprising behaviour if the implementation chooses to go with the first option, and the custom function doesn't support the wrapped primitive.

We're in violent agreement, I guess? What language can we install in the spec to ensure that our intention (immutability) holds true in this case while still allowing annotation of the value?

PS> I don't agree with forbidding the pass-through.

I'm not sure how forbidding trivial .local declarations solves the problem. Consider:

.local $foo = {$bar :func}
{{{$bar :func} and {$foo}}}

If a user expects that the meaning of {{{$bar :func} and {$foo}}} can be determined by replacing $foo with {$bar : func}, they would be wrong. The spec makes no promises about the relationship between the meaning of the expression {$foo} and the meaning of the expression {$bar :func}. The text "An implementation MAY perform additional processing when resolving the value of the expression" means it's free to ignore the right-hand side of every declaration and bind every variable to the empty string.

In other words, the only thing the spec guarantees is that in its scope, $foo is bound to some resolved value.

I think what's confusing is that the substitution property doesn't hold, and .local $foo = {$bar} is just the simplest example that demonstrates that it doesn't hold.

If a user expects that the meaning of {{{$bar :func} and {$foo}}} can be determined by replacing $foo with {$bar : func}, they would be wrong. The spec makes no promises about the relationship between the meaning of the expression {$foo} and the meaning of the expression {$bar :func}.

A relevant part of the spec is under Variable Resolution, where we have:

To resolve the value of a variable, its name is used to identify either a local variable or an input variable. If a declaration exists for the variable, its resolved value is used.

In other words, with the example message that includes the .local, the user's expectation in this case is correct, and the suggested replacement does provide the same result.

In other words, with the example message that includes the .local, the user's expectation in this case is correct, and the suggested replacement does provide the same result.

Ah, I understand now. I was confused by the spec language and thought that "additional processing" could be applied to any right-hand side. I submitted #631 in the hopes of clarifying the spec.

I still have objections, but I'll write a separate comment for that.

If I understand the spec correctly, then in:

.local $bar = {|bar|}
.local $foo = {$bar}

the implementation might bind foo to something other than the string "bar" (it is free to do additional processing on {$bar}).

Also, in:

.local $bar = {|bar| :func}
.local $foo = {$bar}

(supposing that func is defined as a formatting function), the implementation might also do additional processing on {$bar} and bind the result of that processing to $foo. However, the implementation must not do additional processing on {|bar| :func} before binding it to $bar.

Your proposed change would forbid both of those examples. But an alternative would be to say that the second example is allowed, while the first one isn't. In other words, a variable can be aliased only if it's already bound to something that's annotated.

There's precedent for this alternative, since currently:

.local $bar = {1 :number}
.match $bar

is allowed, while

.local $bar = {|1|}
.match $bar

is not (it would be a "missing selector annotation") error. Conceptually, a dataflow analysis has to be done to determine if bar is bound to something with a selector annotation.

The "additional processing" language in the spec basically amounts to an implicit type coercion (like in your example with .local $day = {$today}, where $today may be coerced from a number to an object with extra fields), and implicit type coercions are always going to introduce surprise complexity. But I still don't think "forbid aliasing of unannotated variables, but allow aliasing of unannotated literals or annotated expressions" is the obvious solution. People might not need to write something like .local $foo = {$bar}, but tools that generate code or do source-to-source transformations on code might generate all kinds of surprising things.

Is this issue still important for 45? Can I close it? Or change to feedback?

The core issue is probably arcane enough that it doesn't really matter, provided that implementations only do reasonable things when resolving {$bar}, and functions do similar things to primitive and wrapped values of similar type.

So I think this is something to return to once the ICU4C and ICU4J implementations are in a state that we can play around with them and see how they work.