unicode-org / message-format-wg

Developing a standard for localizable message strings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing MF1 functionality to consider for MF2

aphillips opened this issue · comments

In creating #564 I found the following gaps in our MF1 support in the default registry:

  • ordinal format

    • I propose we add this
  • spellout format

    • I propose we do NOT add this
  • duration format

    • I propose we do NOT add this; other formatters for handling ranges, durations, and so forth can be considered in the future
  • choice selector

    • I propose we do NOT add this, although we have discussed comparison operators in the past
  • number skeletons

    • I propose we do NOT add this in the spring release
  • date skeletons

    • we already decided not to add this in the default registry; ICU may provide it as icu:skeleton??

Ping @sffc, @ryzokuken, and @gibson042 for your thoughts on ordinal formatting. Including that in the MF2 core registry would mean needing JS support for it as well. So making something like this maybe also possible:

const ord = new Intl.NumberFormat('en', { style: 'ordinal' });
ord.format(42)  '42nd'

The CLDR includes the data required for this, but I don't remember exactly where.

The CLDR includes the data required for this, but I don't remember exactly where.

It's in the charts here. Look for ordinal in the table.

It's in the file ordinals.xml in the data here.

402 issue: tc39/ecma402#494

We don't have it yet in ICU4X. Needs design work on the 402 side as well.

Even if CLDR has data, ICU (and Intl?) don't have formatters doing that.
Among others this requires gender information, even without spellout.
For example in Spanish you use (fem) vs (masc).

ª  U+00AA  FEMININE ORDINAL INDICATOR
º  U+00BA  MASCULINE ORDINAL INDICATOR

In Romanian you would to a 2-a (fem) vs al 2-lea (masc), even without spellout.

TLDR:
ICU and Intl can only do ordinal selection, not ordinal format (except for spellout).

So I would do the same: not include ordinal for format, only for selection.

I'm aware of the shortcomings, but I note that there is an ordinal formatter in ICU MessageFormat. Claiming that there isn't one is incorrect. You're right that the data is sparse and that it doesn't account for gender etc. properly. But this code exists:

    public static void ordinalsGalore() {
        for (Locale l : sortAllLocales()) {
            com.ibm.icu.text.MessageFormat mf = new com.ibm.icu.text.MessageFormat("{6}: {0,ordinal} {1,ordinal} {2,ordinal} {3,ordinal} {4,ordinal} {5,ordinal}");
            mf.setLocale(l);
            Object[] args = {1, 2, 3, 4, 5, 6, l.toLanguageTag()};
            System.out.println(mf.format(args));
        }
    }

It observably produces not great output for many locales:

72.1.0.0
af: 1ste 2de 3de 4de 5de 6de
af-Latn-ZA: 1ste 2de 3de 4de 5de 6de
af-NA: 1ste 2de 3de 4de 5de 6de
af-ZA: 1ste 2de 3de 4de 5de 6de
agq: 1th 2th 3th 4th 5th 6th
agq-CM: 1th 2th 3th 4th 5th 6th
agq-Latn-CM: 1th 2th 3th 4th 5th 6th
ak: 1. 2. 3. 4. 5. 6.
ak-GH: 1. 2. 3. 4. 5. 6.
ak-Latn-GH: 1. 2. 3. 4. 5. 6.
am: 1ኛ 2ኛ 3ኛ 4ኛ 5ኛ 6ኛ
am-ET: 1ኛ 2ኛ 3ኛ 4ኛ 5ኛ 6ኛ
am-Ethi-ET: 1ኛ 2ኛ 3ኛ 4ኛ 5ኛ 6ኛ
ar: ١. ٢. ٣. ٤. ٥. ٦.
ar-001: ١. ٢. ٣. ٤. ٥. ٦.
ar-AE: 1. 2. 3. 4. 5. 6.
ar-Arab-EG: ١. ٢. ٣. ٤. ٥. ٦.
ar-BH: ١. ٢. ٣. ٤. ٥. ٦.
...
zh-Hant-TW: 第1 第2 第3 第4 第5 第6
zh-MO: 第1 第2 第3 第4 第5 第6
zh-SG: 第1 第2 第3 第4 第5 第6
zh-TW: 第1 第2 第3 第4 第5 第6
zu: 1th 2th 3th 4th 5th 6th
zu-Latn-ZA: 1th 2th 3th 4th 5th 6th
zu-ZA: 1th 2th 3th 4th 5th 6th