unicode-org / message-format-wg

Developing a standard for localizable message strings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Should we really be using `{{pattern}}` and `|literal|` delimiters?

eemeli opened this issue · comments

The syntax of MessageFormat 2 is the result of a long chain of discussions, arguments, compromises, and the balancing of multiple different stakeholders and concerns. While it is quite capable of fulfilling the demands put upon it, it is literally a design by committee.

While I strongly support our work and our results, I remain concerned that the design decisions we've made specifically about our {{pattern}} and |literal| delimiters, and how weird they are. We have, quite explicitly, ended up choosing string delimiters that are not commonly used as string delimiters, so that embedding MF2 strings within programming languages or JSON does not require internal escapes, and to reduce the frequency of message contents needing to include escapes.

To rationalise our decisions, we have multiple overlapping design documents tracing our path to where we are now; documents that we've argued about and sometimes voted on to unblock our progress. As far as I know, we do not have a single succinct document explaining why these delimiters are the way they are.

As we are now approaching a complete definition of the language and publishing it as a tech preview, I think the delimiters are a specific concern that we ought to be ready to accept some criticism about, and to potentially reconsider for our final release. The base assumptions that I believe we may have mis-estimated include:

  • How common it is to include MF2 messages in a programming language or other context where specific delimiters are required of strings, and alternatives are not available. Pattern delimiters in particular are almost always only needed for strings for which a multi-line presentation is not necessary, but is useful. This reduces the frequency with which conflicts with e.g. " would arise. Many programming languages only support multi-line strings with delimiters like ` and """ that we could specifically avoid.
  • How difficult it is to manually escape string delimiter characters, when they do occur in MF2 source and are not avoidable by using a different string delimiter.
  • How much of an impediment to adoption using unusual delimiters might be. As we've discussed on multiple occasions, we do not expect for really anyone to become a "MessageFormat 2 developer". The result of our work is an auxiliary message formatting language that users will only deal with on occasion. Within that context, I think greater weight should be put on not deviating from contextual assumptions, such as "how to quote stuff".
  • When less technical users interact with MF2 source, what is the surrounding context in which this happens, and what restrictions does that format impose on their work? In other words, if e.g. MF2 messages are embedded in a .properties file that a translator is manually working with, that format does not impose any quoting requirements on message values. In that context, would the user be better served by more common delimiters that may need escaping when they occur within a message body, or by our current {{braces}} and |bars|?
  • What is the appropriate lesson to take from ICU MessageFormat's choices to use ' as an escape character, and to support multiple different "apostrophe modes"? Is it that the needs for escaping should be minimised, or that the rules and practices of escaping should be regularised? With MF2 we've clearly aimed for the former (e.g. limiting which characters may be \ escaped in pattern text vs. literal text), but is that really the only lesson to take here? Could we also consider choosing surprising syntax to be a source of potential errors that we ought to avoid?

Finally, to illustrate what this is all about, consider this MF2 message, using our current syntax:

.input {$count :number}
.local $kind = {|"Granny Smith"|}
.match {$count}
0 {{no {$kind} apples}}
one {{{$count} {$kind} apple}}
* {{{$count} {$kind} apples}}

If we were to allow for more normal pattern and literal delimiters, this same message could read as:

.input {$count :number}
.local $kind = {'"Granny Smith"'}
.match {$count}
0 "no {$kind} apples"
one "{$count} {$kind} apple"
* "{$count} {$kind} apples"

While I appreciate that the alternative syntax would carry some costs, I believe that its benefits in readability and lack of weirdness outweigh the negatives. Therefore, I ask that we be open to discussing these choices further during the tech review phase.

(chair hat on)

@eemeli was asked to file this issue following discussion in the 2024-01-15 teleconference. In that call, we explicitly discussed that this is out-of-scope for LDML45. The MFWG will not consider any further normative preferential changes to the ABNF or syntax in this release. Only editorial ("cleanup") or technical errors ("bugs") within the current design will be considered in this release.

This comment is strictly to document that fact. It is neither an endorsement nor a rejection of this issue.

My take on this


How common it is to include MF2 messages in a programming language or other context where specific delimiters are required of strings, and alternatives are not available

Very often.
And it is not only about programming languages, but also file formats.

There are many formats that delimit their own messages with ", or require " to be escaped.

So 4 of the most common file formats explicitly designed for localization use ", with not alternative.


How difficult it is to manually escape string delimiter characters

Let's take this:

.match ($button :string)
subscribe {{Click "Subscribe" to stop receiving emails}}
unsubscribe {{Click "Subscribe" to ...}}

If we replace {{...}} with quotes in our syntax now this becomes

.match ($button :string)
subscribe "Click \"Subscribe\" to stop receiving emails"
unsubscribe "Click \"Subscribe\" to ..."

And next we store the message in code / json / etc:

{
"msg": ".match ($button :string) subscribe \"Click \\\"Subscribe\\\" to stop receiving emails\" unsubscribe \"Click \\\"Subscribe\\\" to ...\""
}

What is the appropriate lesson to take from ICU MessageFormat's choices to use ' as an escape character

That it is a bad idea to require escaping for characters commonly used in the body of localized messages, and that WYSIWYG is best.