CLDR data as versioned “peer” dependency
rxaviers opened this issue · comments
Goal
- Libraries should be able to define what CLDR versions (the data, not cldrjs itself) they are compatible with.
- Tools should assist on (a) fetching the data and (b) dependency management.
Winner approaches
npm + custom-downloader
Ideal for backend applications.
An npm module that uses a custom downloader. See npm's cldr-data npm module. The implementation of these modules have been inspired by phantomjs.
bower + post-install hook (or grunt task)
Ideal for frontend applications.
A bower module that contains CLDR data zip urls only (really light). It works as follows. A project foo depends on a variety of libraries that have different CLDR data requisites, which define that by using cldr-data bower module in their respective bower.json. When bower install
is executed on project foo, it will resolve and flat the cldr-data versions of each dependencies and it will come up with a cldr-data that accommodates them all. A bower postinstall hook (e.g., cldr-data-downloader) or a grunt task (grunt-cldr-data-downloader) can be used in the sequence to download and populate the bower_components/cldr-data skeleton.
See bower's cldr-data.
Unsuccessful approaches
npm mirror
An npm module (e.g., cldr-data) contains all the CLDR JSON data. It follows the same version numbers of Unicode CLDR, for example cldr-data v26 has the same data served by http://www.unicode.org/Public/cldr/26/json-full.zip.
Usage, a library defines the cldr-data dependency in its package.json
:
"dependencies": {
"cldr-data": "26"
}
Pros
- Simplest solution using existing npm.
Cons
- Big download size. CLDR v26 zipped is 51M big. If the module offers JS (that wraps JSON using cldrjs) along with the JSONs themselves, size is even increased.
- No flat dependency tree.
npm + cherry-pick fetch
❗ fetching everything remotely takes way too long, see comment below.
An npm module (e.g., cldr-data) that follows the same version numbers of Unicode CLDR, but the module itself has no CLDR data. It has an install.js
script that will be executed by npm during installation (the scripts/install
directive), which will fetch the needed files during installation. A kinda of variant of phantomjs, see https://gist.github.com/rxaviers/87e089c35d46fd3a1492.
Usage, a library defines the cldr-data dependency in its package.json
, plus it needs to define which CLDR data set to fetch.
"dependencies": {
"cldr-data": "26"
},
"_cldr": {
"locales": [ "en", "zh", "es", "ar" ]
"jsons": [
"main/ca-gregorian",
"supplemental/likelySubtags"
]
}
Pros
- Saner CLDR data download size;
Cons
- The need to whitelist CLDR sets. Obviously, we could also allow blacklists with "!", e.g. "!supplemental".
- No flat dependency tree.
Question
- Is "_cldr" property on package.json the best place to keep that information?
bower mirror
A cldr-data repository that contains all the CLDR JSON data. It follows the same version numbers of Unicode CLDR, for example cldr-data v26 has the same data served by http://www.unicode.org/Public/cldr/26/json-full.zip.
Usage, a library defines the cldr-data dependency in its bower.json
:
"dependencies": {
"cldr-data": "26"
}
Pros
- Simplest solution using existing bower.
Cons
- Big download size. CLDR v26 zipped is 51M big. If the module offers JS (that wraps JSON using cldrjs) along with the JSONs themselves, size is even increased.
I've implemented a proof of concept for the "npm (custom fetch)" approach:
https://gist.github.com/rxaviers/87e089c35d46fd3a1492.
The script fetches an initial url (e.g., http://www.unicode.org/repos/cldr-aux/json/26/
) and starts crawling content to seek for other URLs filtered by a glob pattern (e.g., http://www.unicode.org/repos/cldr-aux/json/26/main/*/numbers.json
, or http://www.unicode.org/repos/cldr-aux/json/26/**/numbers.json
).
Conclusion, the content crawling is pretty quick. Although, making multiple requests to crawl and fetch the above content takes way too long. In terms of speed, the simpler approach (that fetches the whole set) is much better.
I've mirrored the whole CLDR JSON v26 into a github repository. Then, I tried to publish it to an npm module. But, it failed:
util.js:35
var str = String(f).replace(formatRegExp, function(x) {
^
RangeError: Maximum call stack size exceeded
Trying to fetch the full mirror via bower works. But, it's tedious.
I've just created a CLDR JSON downloader https://github.com/rxaviers/cldr-data-downloader
@raphamorim yeap. That script is my initial attempt to cherry-pick the files. This is, a custom downloader. But, that didn't work well.
@rxaviers, I've only tested your guide in readme. The goal is when run this module, he auto identify and download the defined version in the package.json ?
Both cldr-data and cldr-data-full npm modules have been created. They address the goal of this issue as follows.
- Libraries should be able to define what CLDR versions (the data, not cldrjs itself) they are compatible with.
On an i18n library, define which CLDR versions it's compatible with using its package.json
.
"dependencies": {
"cldr-data": ">26"
}
- Tools should assist on (a) fetching the data and (b) dependency management.
The appropriate CLDR JSON data will be fetched with npm install
.
- cldr-data installs http://www.unicode.org/Public/cldr/26/json.zip;
- cldr-data-full installs http://www.unicode.org/Public/cldr/26/json-full.zip;
Node.js users can access the data by using require("cldr-data")
.
var cldr = require("cldr-data");
var plurals = cldr("supplemental/plurals");
It's ideal to use cldr-data
in conjunction with cldrjs
.
var Cldr = require("cldrjs");
var cldr = require("cldr-data");
Cldr.load(cldr("supplemental/plurals"));
More info see README.
Comparing installation times of the core coverage. Note the full coverage makes using github mirrors unusable.
method | time |
---|---|
npm mirror | 3m57.674s |
bower mirror | 1m1.001s |
npm + custom-downloader | 0m8.958s |
bower + custom-downloader | 0m9.506s |
Follow below the output I got running each command. Feel free to execute them yourself.
npm mirror
$ time npm install rxaviers/cldr-data#b0.0.1
cldr-data@0.0.1-alpha.3 node_modules/cldr-data
real 3m57.674s
user 3m27.044s
sys 1m14.285s
bower mirror
$ time bower install rxaviers/cldr-data#b0.0.1
[?] May bower anonymously report usage statistics to improve the tool over time?[?] May bower anonymously report usage statistics to improve the tool over time? No
bower not-cached git://github.com/rxaviers/cldr-data.git#b0.0.1
bower resolve git://github.com/rxaviers/cldr-data.git#b0.0.1
bower checkout cldr-data#b0.0.1
bower invalid-meta cldr-data is missing "main" entry in bower.json
bower invalid-meta cldr-data is missing "ignore" entry in bower.json
bower resolved git://github.com/rxaviers/cldr-data.git#1aeff0b182
bower install cldr-data#1aeff0b182
cldr-data#1aeff0b182 bower_components/cldr-data
real 1m1.001s
user 0m46.573s
sys 0m15.770s
npm + custom-downloader
$ time npm install cldr-data
\
> cldr-data@26.0.4 install /tmp/x/node_modules/cldr-data
> node install.js
GET `http://www.unicode.org/Public/cldr/26/json.zip`
[========================================] 100% 0.0s
Received 3425K total.
Unpacking it into `./json`
cldr-data@26.0.4 node_modules/cldr-data
└── cldr-data-downloader@0.1.0 (progress@1.1.8, q@1.0.1, request-progress@0.3.1, nopt@3.0.1, mkdirp@0.5.0, adm-zip@0.4.4, npmconf@2.0.9, request@2.44.0)
real 0m8.958s
user 0m7.022s
sys 0m0.978s
bower + custom-downloader
Requires setting up .bowerrc.
$ time bower install cldr-data
bower not-cached git://github.com/rxaviers/cldr-data-bower.git#*
bower resolve git://github.com/rxaviers/cldr-data-bower.git#*
bower download https://github.com/rxaviers/cldr-data-bower/archive/26.0.2.tar.gz
bower extract cldr-data#* archive.tar.gz
bower invalid-meta cldr-data is missing "ignore" entry in bower.json
bower resolved git://github.com/rxaviers/cldr-data-bower.git#26.0.2
bower preinstall npm install cldr-data-downloader
bower install cldr-data#26.0.2
bower postinstall node ./node_modules/cldr-data-downloader/bin/download.js -i bower_components/cldr-data/index.json -o bower_components/cldr-data/
cldr-data#26.0.2 bower_components/cldr-data
real 0m9.506s
user 0m7.855s
sys 0m1.135s