papandreou / node-cldr

node.js library for extracting data from CLDR (the Unicode Common Locale Data Repository)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong spellout cardinal for et locale

papandreou opened this issue · comments

From: papandreou/inter#6 (reported by @ragulka)

console.log(cldr.extractRbnfFunctionByType('et').renderSpelloutCardinal(2439871));

Gives: üks miljonit nelisada ükskümmend üheksa tuhat kolmkümmend üks

The above result is totally messed up. It should say:
kaks miljonit nelisada kolmkümmend üheksa tuhat kaheksasada seitsekümmend üks

Basically, translated to english, it is currently saying:
One millione four-hunderd onety-nine-thousand thrity-one

I guess there must be a serious flaw in the cardinal function generator.

Confirmed (as for the expected Estonian output, I'm taking your word for it :).

After stepping through the generated code by hand, I'm afraid I misread the spec when I implemented it.

@ragulka, could you take a look at the source data and tell me approximately where it goes wrong? https://github.com/papandreou/node-cldr/blob/master/3rdparty/cldr/common/rbnf/et.xml

@papandreou I could, but I'm not sure what you mean by "where it goes wrong". The spellout for estonian that I posted originally, is wrong from the first letter. It should start with "kaks miljonit", but it starts with "üks miljon".

I mean try reading the rbnf rules I linked to and point out where the code is picking the wrong production. It's not that hard to "hand run" it when you know the language. I tried and failed with your example, but I have no trouble with Danish, English, and German rbnfs.

Here's the generated code:

console.log(require('./lib/cldr').extractRbnfFunctionByType('et').renderSpelloutCardinal.toString())
function () {
    var isFractional = n !== Math.floor(n);
    if (n < 0) return "miinus " + this.renderSpelloutCardinal(-n);
    if (isFractional && n > 1) return this.renderSpelloutCardinal(Math.floor(n)) + " koma " + this.renderSpelloutCardinal(parseInt(String(n).replace(/\d*\./, ""), 10));
    if (n >= 1e18) return this.renderNumber(n, "#,##0");
    if (n >= 2e15) return this.renderSpelloutCardinal(Math.floor(n / 2e15)) + " biljardit" + (n === 2e15 ? "" : " " + this.renderSpelloutCardinal(n % 1e15));
    if (n >= 1e15) return this.renderSpelloutCardinal(Math.floor(n / 1e15)) + " biljard" + (n === 1e15 ? "" : " " + this.renderSpelloutCardinal(n % 1e14));
    if (n >= 2e12) return this.renderSpelloutCardinal(Math.floor(n / 2e12)) + " biljonit" + (n === 2e12 ? "" : " " + this.renderSpelloutCardinal(n % 1e12));
    if (n >= 1e12) return this.renderSpelloutCardinal(Math.floor(n / 1e12)) + " biljon" + (n === 1e12 ? "" : " " + this.renderSpelloutCardinal(n % 1e11));
    if (n >= 2e9) return this.renderSpelloutCardinal(Math.floor(n / 2e9)) + " miljardit" + (n === 2e9 ? "" : " " + this.renderSpelloutCardinal(n % 1e9));
    if (n >= 1e9) return this.renderSpelloutCardinal(Math.floor(n / 1e9)) + " miljard" + (n === 1e9 ? "" : " " + this.renderSpelloutCardinal(n % 1e8));
    if (n >= 2e6) return this.renderSpelloutCardinal(Math.floor(n / 2e6)) + " miljonit" + (n === 2e6 ? "" : " " + this.renderSpelloutCardinal(n % 1e6));
    if (n >= 1e6) return this.renderSpelloutCardinal(Math.floor(n / 1e6)) + " miljon" + (n === 1e6 ? "" : " " + this.renderSpelloutCardinal(n % 1e5));
    if (n >= 1e3) return this.renderSpelloutCardinal(Math.floor(n / 1e3)) + " tuhat" + (n === 1e3 ? "" : " " + this.renderSpelloutCardinal(n % 100));
    if (n >= 100) return this.renderSpelloutCardinal(Math.floor(n / 100)) + "sada" + (n === 100 ? "" : " " + this.renderSpelloutCardinal(n % 100));
    if (n >= 20) return this.renderSpelloutCardinal(Math.floor(n / 20)) + "kümmend" + (n === 20 ? "" : " " + this.renderSpelloutCardinal(n % 10));
    if (n >= 11) return this.renderSpelloutCardinal(n % 10) + "teist";
    if (n >= 10) return "kümme";
    if (n >= 9) return "üheksa";
    if (n >= 8) return "kaheksa";
    if (n >= 7) return "seitse";
    if (n >= 6) return "kuus";
    if (n >= 5) return "viis";
    if (n >= 4) return "neli";
    if (n >= 3) return "kolm";
    if (n >= 2) return "kaks";
    if (n >= 1) return "üks";
    if (n >= 0) return "null"
};

The rules and the generated code seem fine to me. I am not sure where it goes wrong, then. Can there be an issue with the number that's being passed to it?

Can there be an issue with the number that's being passed to it?
I don't think so. How could there be?

The bug has to be somewhere in that generated code.

Maybe you could try single-stepping through the below?

var rbnf = {
    renderNumber: function (n) {
        return n.toFixed(3);
    },
    renderSpelloutCardinal = function (n) {
        var isFractional = n !== Math.floor(n);
        if (n < 0) return "miinus " + this.renderSpelloutCardinal(-n);
        if (isFractional && n > 1) return this.renderSpelloutCardinal(Math.floor(n)) + " koma " + this.renderSpelloutCardinal(parseInt(String(n).replace(/\d*\./, ""), 10));
        if (n >= 1e18) return this.renderNumber(n, "#,##0");
        if (n >= 2e15) return this.renderSpelloutCardinal(Math.floor(n / 2e15)) + " biljardit" + (n === 2e15 ? "" : " " + this.renderSpelloutCardinal(n % 1e15));
        if (n >= 1e15) return this.renderSpelloutCardinal(Math.floor(n / 1e15)) + " biljard" + (n === 1e15 ? "" : " " + this.renderSpelloutCardinal(n % 1e14));
        if (n >= 2e12) return this.renderSpelloutCardinal(Math.floor(n / 2e12)) + " biljonit" + (n === 2e12 ? "" : " " + this.renderSpelloutCardinal(n % 1e12));
        if (n >= 1e12) return this.renderSpelloutCardinal(Math.floor(n / 1e12)) + " biljon" + (n === 1e12 ? "" : " " + this.renderSpelloutCardinal(n % 1e11));
        if (n >= 2e9) return this.renderSpelloutCardinal(Math.floor(n / 2e9)) + " miljardit" + (n === 2e9 ? "" : " " + this.renderSpelloutCardinal(n % 1e9));
        if (n >= 1e9) return this.renderSpelloutCardinal(Math.floor(n / 1e9)) + " miljard" + (n === 1e9 ? "" : " " + this.renderSpelloutCardinal(n % 1e8));
        if (n >= 2e6) return this.renderSpelloutCardinal(Math.floor(n / 2e6)) + " miljonit" + (n === 2e6 ? "" : " " + this.renderSpelloutCardinal(n % 1e6));
        if (n >= 1e6) return this.renderSpelloutCardinal(Math.floor(n / 1e6)) + " miljon" + (n === 1e6 ? "" : " " + this.renderSpelloutCardinal(n % 1e5));
        if (n >= 1e3) return this.renderSpelloutCardinal(Math.floor(n / 1e3)) + " tuhat" + (n === 1e3 ? "" : " " + this.renderSpelloutCardinal(n % 100));
        if (n >= 100) return this.renderSpelloutCardinal(Math.floor(n / 100)) + "sada" + (n === 100 ? "" : " " + this.renderSpelloutCardinal(n % 100));
        if (n >= 20) return this.renderSpelloutCardinal(Math.floor(n / 20)) + "kümmend" + (n === 20 ? "" : " " + this.renderSpelloutCardinal(n % 10));
        if (n >= 11) return this.renderSpelloutCardinal(n % 10) + "teist";
        if (n >= 10) return "kümme";
        if (n >= 9) return "üheksa";
        if (n >= 8) return "kaheksa";
        if (n >= 7) return "seitse";
        if (n >= 6) return "kuus";
        if (n >= 5) return "viis";
        if (n >= 4) return "neli";
        if (n >= 3) return "kolm";
        if (n >= 2) return "kaks";
        if (n >= 1) return "üks";
        if (n >= 0) return "null"
    }
};

console.log(rbnf(2439871));

Okay, here are my initial findings:

  • When a string representing a number is supplied, the results are unexpected. I think there should be a Number(n); in the beginning of the function just to make sure we are dealing with the number type
  • Estonian spellout starts going wrong with this line:

javascript
if (n >= 20) return renderSpelloutCardinal(Math.floor(n / 20)) + "kümmend" + (n === 20 ? "" : " " + renderSpelloutCardinal(n % 10));


It always produces the wrong result. The division should be with 10 instead of 20. When I changed it, it seemed the problem with that part.
- This line:

``` javascript
    if (n >= 1e3) return renderSpelloutCardinal(Math.floor(n / 1e3)) + " tuhat" + (n === 1e3 ? "" : " " + renderSpelloutCardinal(n % 100));

is using a wrong modulus. It should be n % 1e3 instead of n % 100.

  • Everything above thousands is using a wrong division when dealing with plurals, for example:
    if (n >= 2e9) return renderSpelloutCardinal(Math.floor(n / 2e9)) ... should be if (n >= 2e9) return renderSpelloutCardinal(Math.floor(n / 1e9))

Here's what the complete spellout cardinal function should look like for estonian language:

renderSpelloutCardinal = function (n) {
    var isFractional = n !== Math.floor(n);
    if (n < 0) return "miinus " + renderSpelloutCardinal(-n);
    if (isFractional && n > 1) return renderSpelloutCardinal(Math.floor(n)) + " koma " + renderSpelloutCardinal(parseInt(String(n).replace(/\d*\./, ""), 10));
    if (n >= 1e18) return renderNumber(n, "#,##0");
    if (n >= 2e15) return renderSpelloutCardinal(Math.floor(n / 1e15)) + " biljardit" + (n === 2e15 ? "" : " " + renderSpelloutCardinal(n % 1e15));
    if (n >= 1e15) return renderSpelloutCardinal(Math.floor(n / 1e15)) + " biljard" + (n === 1e15 ? "" : " " + renderSpelloutCardinal(n % 1e14));
    if (n >= 2e12) return renderSpelloutCardinal(Math.floor(n / 1e12)) + " biljonit" + (n === 2e12 ? "" : " " + renderSpelloutCardinal(n % 1e12));
    if (n >= 1e12) return renderSpelloutCardinal(Math.floor(n / 1e12)) + " biljon" + (n === 1e12 ? "" : " " + renderSpelloutCardinal(n % 1e11));
    if (n >= 2e9) return renderSpelloutCardinal(Math.floor(n / 1e9)) + " miljardit" + (n === 2e9 ? "" : " " + renderSpelloutCardinal(n % 1e9));
    if (n >= 1e9) return renderSpelloutCardinal(Math.floor(n / 1e9)) + " miljard" + (n === 1e9 ? "" : " " + renderSpelloutCardinal(n % 1e8));
    if (n >= 2e6) return renderSpelloutCardinal(Math.floor(n / 1e6)) + " miljonit" + (n === 2e6 ? "" : " " + renderSpelloutCardinal(n % 1e6));
    if (n >= 1e6) return renderSpelloutCardinal(Math.floor(n / 1e6)) + " miljon" + (n === 1e6 ? "" : " " + renderSpelloutCardinal(n % 1e5));
    if (n >= 1e3) return renderSpelloutCardinal(Math.floor(n / 1e3)) + " tuhat" + (n === 1e3 ? "" : " " + renderSpelloutCardinal(n % 1e3));
    if (n >= 100) return renderSpelloutCardinal(Math.floor(n / 100)) + "sada" + (n === 100 ? "" : " " + renderSpelloutCardinal(n % 100));
    if (n >= 20) return renderSpelloutCardinal(Math.floor(n / 10)) + "kümmend" + (n === 20 ? "" : " " + renderSpelloutCardinal(n % 10));
    if (n >= 11) return renderSpelloutCardinal(n % 10) + "teist";
    if (n >= 10) return "kümme";
    if (n >= 9) return "üheksa";
    if (n >= 8) return "kaheksa";
    if (n >= 7) return "seitse";
    if (n >= 6) return "kuus";
    if (n >= 5) return "viis";
    if (n >= 4) return "neli";
    if (n >= 3) return "kolm";
    if (n >= 2) return "kaks";
    if (n >= 1) return "üks";
    if (n >= 0) return "null"
}

I tested the above function with several different numbers and it worked flawlessly each time.

Great, that's extremely helpful. That means there's at least one bug in https://github.com/papandreou/node-cldr/blob/master/lib/CldrRbnfRuleSet.js

I pushed a branch called issue12 with a failing test. You're very welcome to take a stab at it.

After checking out the branch and running run npm install --dev, you should be able to run npm test and get this output:

andreas@papandreou:~/work/node-cldr$ npm test

> cldr@2.1.0 test /home/andreas/work/node-cldr
> mocha --check-leaks


  ․․․․․․․․․․․․․․․․

  14 passing (91ms)
  1 pending
  1 failing

  1) extractRbnfFunctionByType #renderSpelloutCardinal should render a number correctly in Estonian (regression test for #12):

      Error: expected 'üks miljonit nelisada ükskümmend üheksa tuhat kolmkümmend üks' to equal 'kaks miljonit nelisada kolmkümmend üheksa tuhat kaheksasada seitsekümmend üks'
      + expected - actual

      +kaks miljonit nelisada kolmkümmend üheksa tuhat kaheksasada seitsekümmend üks
      -üks miljonit nelisada ükskümmend üheksa tuhat kolmkümmend üks

      at Context.<anonymous> (/home/andreas/work/node-cldr/test/extractRbnfFunctionByType.js:8:13)
      at Test.Runnable.run (/home/andreas/work/node-cldr/node_modules/mocha/lib/runnable.js:221:32)
      at Runner.runTest (/home/andreas/work/node-cldr/node_modules/mocha/lib/runner.js:374:10)
      at /home/andreas/work/node-cldr/node_modules/mocha/lib/runner.js:452:12
      at next (/home/andreas/work/node-cldr/node_modules/mocha/lib/runner.js:299:14)
      at /home/andreas/work/node-cldr/node_modules/mocha/lib/runner.js:309:7
      at next (/home/andreas/work/node-cldr/node_modules/mocha/lib/runner.js:247:23)
      at Object._onImmediate (/home/andreas/work/node-cldr/node_modules/mocha/lib/runner.js:276:5)
      at processImmediate [as _immediateCallback] (timers.js:330:15)



npm ERR! weird error 1
npm ERR! not ok code 0

Fixed in 2.1.1.

I added your complete expected function to the test (36c26a7), but had to change some divisors to make it all fit (441102b). Does that look OK?

Looks good now, thanks!

Great, thanks for the effort. Closing.