r-lyeh-archived / CLDR

Compact data from the Unicode Common Locale Data Repository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLDR

Compact data from the Unicode Common Locale Data Repository

For anyone interested, I just dumped most of the CLDR data in a compact way (see provided CLDR.INI file).

The final data for all languages is 13,727,497 bytes, but still highly compressible, as seen below.

BSC:    1,023,137 bytes, ratio=92.5468% enctime=1211822us dectime=757387us
BROTLI: 1,287,148 bytes, ratio=90.6236% enctime=67236011us dectime=50243us
LZMA25: 1,369,212 bytes, ratio=90.0258% enctime=3961970us dectime=98609us
LZIP:   1,369,811 bytes, ratio=90.0214% enctime=3895218us dectime=131528us
LZMA20: 1,423,667 bytes, ratio=89.6291% enctime=3334007us dectime=103905us
MINIZ:  1,892,977 bytes, ratio=86.2103% enctime=697077us dectime=31830us
ZSTD:   2,108,053 bytes, ratio=84.6436% enctime=65694us dectime=34525us
LZ4HC:  2,151,652 bytes, ratio=84.326% enctime=491871us dectime=13641us
LZ4:    2,918,991 bytes, ratio=78.7362% enctime=37851us dectime=13775us
RAW:   13,727,497 bytes, ratio=0% enctime=16242us dectime=7658us

This is what is currently processed from the CLDR repos:

  • skipped
  • extracted

cldr-core/supplemental/

  • aliases.json
  • calendarData.json
  • calendarPreferenceData.json
  • characterFallbacks.json
  • codeMappings.json
  • currencyData.json
  • gender.json
  • languageData.json
  • languageMatching.json
  • likelySubtags.json
  • measurementData.json
  • metaZones.json
  • numberingSystems.json
  • ordinals.json
  • parentLocales.json
  • plurals.json
  • primaryZones.json
  • references.json
  • telephoneCodeData.json
  • territoryContainment.json
  • territoryInfo.json (interesting!)
  • timeData.json
  • weekData.json
  • windowsZones.json

cldr-dates-modern\main\xx-XX

  • ca-generic.json
  • ca-gregorian.json
  • dateFields.json
  • timeZoneNames.json

cldr-localenames-modern\main\xx-XX

  • languages.json
  • localeDisplayNames.json
  • scripts.json
  • territories.json
  • transformNames.json
  • variants.json

cldr-misc-modern\main\xx-XX

  • characters.json
  • contextTransforms.json
  • delimiters.json
  • layout.json
  • listPatterns.json
  • posix.json

cldr-numbers-modern\main\xx-XX

  • currencies.json
  • numbers.json

cldr-segments-modern\segments\xx-XX

  • suppressions.json

Licenses

About

Compact data from the Unicode Common Locale Data Repository

License:The Unlicense


Languages

Language:C++ 95.6%Language:Batchfile 4.4%