denoland / icu

For floating patches on top of https://chromium.googlesource.com/chromium/deps/icu.git

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Name: icu
URL: https://github.com/unicode-org/icu
Version: 68.1
CPEPrefix: cpe:/a:icu-project:international_components_for_unicode:68.1
License: MIT
Security Critical: yes

Description:
This directory contains the source code of ICU 68.1 for C/C++.

A. How to update ICU

1. Run "scripts/update.sh <version>" (e.g. 68-1).
   This will download ICU from the upstream git repository.
   It does preserve Chrome-specific build files and
   converter files. (see section C)

   source.gni and icu.gyp* files are automatically updated, too.

2. Review and apply patches/changes in "D. Local Modifications" if
   necessary/applicable. Update patch files in patches/.

3. Follow the instructions in section B on building ICU data files

B. How to build ICU data files


Pre-built data files are generated and checked in with the following steps

1. icu data files for Chrome OS, Linux, Mac and Windows

  a. Make a icu data build directory outside the Chromium source tree
     and cd to that directory (say, $ICUBUILDIR).

  b. Run
       ${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh

     This script takes the following steps:

     i) Run
        ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests

     ii) Run make

     iii) (cd data && make clean)

     iv) scripts/config_data.sh common
       This configure the build with filer for common.

     v) Run make

     vi)  scripts/copy_data.sh common
       This copies the ICU data files for non-Android platforms
       (both Little and Big Endian) to the following locations:

       common/icudtl.dat
       common/icudtb.dat

     vii) Repeat step iii) - vi) for chromeos to produce chromeos/icudtl.dat

     viii) cast/patch_locale.sh
       Modify the file for cast, android, ios and flutter.

     ix) Repeat step iii) - vi) for cast, andriod and ios to produce
       cast/icudtl.dat
       andriod/icudtl.dat
       ios/icudtl.dat

     x) flutter/patch_brkitr.sh
       On top of cast/patch_locale.sh.sh (step viii)), further patch
       the code for flutter.

     xi) Repeat step iii) - vi) for flutter to produce
       flutter/icudtl.dat

     xii) scripts/clean_up_data_source.sh

     This reverts the result of cast/patch_locale.sh and flutter/patch_brkitr.sh
     make the tree ready for committing updated ICU data files for
     non-Android and Android platforms.

  c. Whenever data is updated (e.g timezone update), take step b as long
     as the ICU build directory used in a. is kept.

2. Note on the locale data customization

  - filter/chromeos.json
      a. Filter the locale data for ChromeOS's UI langauges :
         locales, lang, region, currency, zone
      b. Filter the locale data for non-UI languages to the bare minimum :
         ExemplarCharacters, LocaleScript, layout, and the name of the
         language for a locale in its native language.
      c. Filter the legacy Chinese character set-based collation
         (big5han/gb2312han) that don't make any sense and nobdoy uses.

  - filter/common.json
      Same as above in filter/chromeos.json, AND
      e. Filter exemplar cities in timezone data (data/zone).

  - filter/android.json and filter/ios.json
      a. Filter the locale data for Android / iOS UI langauges :
         locales, lang, region, currency, zone
      b. Filter the locale data for non-UI languages to the bare minimum :
         ExemplarCharacters, LocaleScript, layout, and the name of the
         language for a locale in its native language.
      c. Filter the legacy Chinese character set-based collation
      d. Filter source/data/{region,lang} to exclude these data
         except the language and script names of zh_Hans and zh_Hant.
      e. Keep only the minimal calendar data in data/locales.
      f. Include currency display names for a smaller subset of currencies.
      g. Minimize the locale data for 9 locales to which Chrome on Android
         is not localized.


C. Chromium-specific data build files and converters

They're preserved in step A.1 above. In general, there's no need to touch
them when updating ICU.

1. source/data/mappings
  - convrtrs.txt : Lists encodings and aliases required by the WHATWG
    Encoding spec plus a few extra (see the file as to why).

  - ucmlocal.txt : to list only converters we need.

  - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
    Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
    They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.

  - gb18030.ucm and windows-936.ucm
    gb_table.patch was applied for the following changes. No need
    to apply it again. The patch is kept for the record.
    a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
    the encoding spec (one-way mapping in toUnicode direction).
    b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
    from U+1E3F to \xA8\xBC (windows-936/GBK).
       See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3

2. source/data/brkitr
  - dictionaries/khmerdict.txt: Abridged Khmer dictionary. See
    https://unicode-org.atlassian.net/browse/ICU-9451
  - rules/word_ja.txt (used only on Android)
    Added for Japanese-specific word-breaking without the C+J dictionary.
  - rules/{root,zh,zh_Hant}.txt
    a. Use line_normal by default.
    b. Drop local patches we used to have for the following issues. They'll
       be dealt with in the upstream (Unicode/CLDR).
       http://unicode.org/cldr/trac/ticket/6557
       http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)

3. Add {an,ku,tg,wa}.txt to source/data/{locale,lang}
   with the minimal locale data necessary for spellchecker and
   and language menus.

D. Local Modifications

1. Applied locale data patches from Google obtained by diff'ing
   the upstream copy and Google's internal copy for source/data

  - patches/locale_google.patch:
    * Google's internal ICU locale changes
    * Simpler region names for Hong Kong and Macau in all locales
    * Currency signs in ru and uk locales (do not include 'tr' locale changes)
    * AM/PM, midnight, noon formatting for a few Indian locales
    * Timezone name changes in Korean and Chinese locales
    * Default digit for Arabic locale is European digits.

  - patches/locale1.patch: Minor fixes for Korean


2. Breakiterator patches
  - patches/wordbrk.patch for word.txt
    a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
       FQDN labels can be split at '.'
    b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
       See http://unicode.org/cldr/trac/ticket/6555

  - patches/khmer-dictbe.patch
    Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
    https://unicode-org.atlassian.net/browse/ICU-9451

  - Add several common Chinese words that were dropped previously to
    source/data/cjdict/brkitr/cjdict.txt
    patch: patches/cjdict.patch
    upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888

3. Timezone data update
  Run scripts/update_tz.sh to grab the latest version of the
  following timezone data files and put them in source/data/misc

     metaZones.txt
     timezoneTypes.txt
     windowsZones.txt
     zoneinfo64.txt

  As of Janurary 27, 2021, the latest version is 2021a and the above files
  are available at the ICU github repos.

4. Build-related changes

  - patches/configure.patch:
    * Remove a section of configure that will cause breakage while
      running runConfigureICU.

  - patches/wpo.patch (only needed when icudata dll is used).
    upstream bugs : https://unicode-org.atlassian.net/browse/ICU-8043
                    https://unicode-org.atlassian.net/browse/ICU-5701

  - patches/data_symb.patch :
      Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
      the icu data file or icudt.dll

5. ISO-2022-JP encoding (fromUnicode) change per WHATWG encoding spec.
  - patches/iso2022jp.patch
  - upstream bug:
    https://unicode-org.atlassian.net/browse/ICU-20251

6. Enable tracing of file but not resource, only for Chromium
    to reduce performance impact/risk.
  - patches/restrace.patch

7. Patch Arabic date time pattern back to 67 value to avoid test
   breakage in
   third_party/blink/web_tests/fast/forms/datetimelocal/datetimelocal-appearance-l10n.html
  - patches/ardatepattern.patch
  - https://bugs.chromium.org/p/chromium/issues/detail?id=1139186

8. Patch fix for CFI due to unnecessary cast
  - patches/fixCFI.patch
  - updatream PR:
    unicode-org/icu#1440
  - updtream bug:
    https://unicode-org.atlassian.net/browse/ICU-21373

9. Patch fix for > 8 items in ListFormatter Memory READ
  - patches/listformatmemread.patch
  - updatream PR:
    unicode-org/icu#1450
  - updtream bug:
    https://unicode-org.atlassian.net/browse/ICU-21383

10. Patch fix for Locale::setKeywordValue
    patches/setkeywordvalue.patch
  - updatream PR:
    unicode-org/icu#1461
  - updtream bug:
    https://unicode-org.atlassian.net/browse/ICU-21385

11. Patch Windows to fix Host timezone detection
    patches/wintz.patch
    patches/wintz2.patch
  - updatream PR:
    unicode-org/icu#1465
    unicode-org/icu#1539
  - updtream bug:
    https://unicode-org.atlassian.net/browse/ICU-21392
    https://unicode-org.atlassian.net/browse/ICU-21465

12. Patch fixing crash in list format
    patches/formatted_string_builder.patch
  - updatream PR:
    unicode-org/icu#1479
  - updtream bug:
    https://unicode-org.atlassian.net/browse/ICU-21410

13. Patch locale canoncalization fixes
    patches/loc.patch
  - updatream PR:
    unicode-org/icu#1470
    unicode-org/icu#1475
    unicode-org/icu#1478
    unicode-org/icu#1491
    unicode-org/icu#1504
  - updtream bug:
    https://unicode-org.atlassian.net/browse/ICU-21401
    https://unicode-org.atlassian.net/browse/ICU-21402
    https://unicode-org.atlassian.net/browse/ICU-21408
    https://unicode-org.atlassian.net/browse/ICU-21406
    https://unicode-org.atlassian.net/browse/ICU-21414
    https://unicode-org.atlassian.net/browse/ICU-21433

14. Patch locale canoncalization fixes which is part of 68.2
    patches/loc2.patch
  - updatream PR:
    unicode-org/icu#1502
  - updtream bug:
    https://unicode-org.atlassian.net/browse/ICU-21430

15. Remove explicit std::atomic<NumberRangeFormatterImpl*> template
    instantiation
    patches/atomic_template_instantiation.patch
  - The explicit instantiation was added to silence MSVC C4251 warnings:
    https://unicode-org.atlassian.net/browse/ICU-20157
    Small test cases show that it is generally an error to instantiate
    std::atomic<T*> with an incomplete type T with MSVC, clang, and GCC, so this
    instantiation never should have worked:
    https://gcc.godbolt.org/z/34xx8h
    At this time, it's not clear if this particular instantiation with
    NumberRangeFormatterImpl* was ever necessary for MSVC. Further testing with
    MSVC is required to upstream this patch.

About

For floating patches on top of https://chromium.googlesource.com/chromium/deps/icu.git

License:Other


Languages

Language:C++ 78.2%Language:C 18.0%Language:Makefile 1.4%Language:Python 0.7%Language:Shell 0.4%Language:HTML 0.3%Language:M4 0.3%Language:Perl 0.3%Language:Roff 0.3%Language:Objective-C 0.0%Language:Batchfile 0.0%Language:CSS 0.0%Language:sed 0.0%