BorisMoore / jsrender

A lightweight, powerful and highly extensible templating engine. In the browser or on Node.js, with or without jQuery.

Home Page:http://www.jsviews.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unicode characters in template expressions

johan-ohrn opened this issue · comments

We have some autogenerated code and for whatever reasons we have javascript objects with property names using non english characters such as åäö.
When we want to print a property from such an object we would type something like this in our template:
{{:myObj.someÅpropÄerty}}

This doesn't work because the javascript generated from this template expression is invalid.
+((v=data.myObj.someÅdata.propÄdata.erty)!=null?v:"");

I managed to track down the problem to a couple of regexes using \w character class, which doesn't support anything but ascii characters.

I made the following changing in jsrender.js - we use the separate files
Lines 54-57:

    uW = "\\w\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6",

    //rPath = /^(!*?)(?:null|true|false|\d[\d.]*|([\w$]+|\.|~([\w$]+)|#(view|([\w$]+))?)([\w$.^]*?)(?:[.[^]([\w$]+)\]?)?)$/g,
    rPath = new RegExp("^(!*?)(?:null|true|false|\\d[\\d.]*|([" + uW + "$]+|\\.|~([" + uW + "$]+)|#(view|([" + uW + "$]+))?)([" + uW + "$.^]*?)(?:[.[^]([" + uW + "$]+)\\]?)?)$", "g"),
    //        not                               object     helper    view  viewProperty pathTokens      leafToken


    //rParams = /(\()(?=\s*\()|(?:([([])\s*)?(?:(\^?)(~?[\w$.^]+)?\s*((\+\+|--)|\+|-|~(?![\w$_])|&&|\|\||===|!==|==|!=|<=|>=|[<>%*:?\/]|(=))\s*|(!*?(@)?[#~]?[\w$.^]+)([([])?)|(,\s*)|(\(?)\\?(?:(')|("))|(?:\s*(([)\]])(?=[.^]|\s*$|[^([])|[)\]])([([]?))|(\s+)/g,
    rParams = new RegExp("(\\()(?=\\s*\\()|(?:([([])\\s*)?(?:(\\^?)(~?[" + uW + "$.^]+)?\\s*((\\+\\+|--)|\\+|-|~(?![" + uW + "$_])|&&|\\|\\||===|!==|==|!=|<=|>=|[<>%*:?\\/]|(=))\\s*|(!*?(@)?[#~]?[" + uW + "$.^]+)([([])?)|(,\\s*)|(\\(?)\\\\?(?:(')|(\"))|(?:\\s*(([)\\]])(?=[.^]|\\s*$|[^([])|[)\\]])([([]?))|(\\s+)", "g"),

Seems kinda ugly to hard code the extra non-english characters like this. Perhaps you could make this user configurable or give unicode more generic support?

Even if using non-english characters for property names seem like a bad idea it's still valid javascript and as such I feel the same expression should be valid inside the template.

As a workarround I know we can type
{{:myObj["someÅpropÄerty"]}}
but this is ugly.

What's your take on the whole?

I forgot to mention this is on version 0.9.91 but the code is identical in version 1.0.0.

It looks like providing generic support for unicode property names would be difficult without significant perf / file size cost, at least judging by this https://mothereff.in/js-variables#%E0%B2%A0_%E0%B2%A0, and the code behind it.

Providing extensibility is possible, so people can add specific unicode characters they want to support, like you did, via

$.views.settings.advanced({addJsNameChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6");

or similar.

But given the fact that you can do {{:myObj["someÅpropÄerty"]}}, as you say, I'm not sure about adding the setting... It still requires setting charactars manually, character by character...

I think a setting would be valuable for people in this situation. I agree that it's clumsy to have to individually add support for each character but at least it's a one time configuration.

In this particular case the offending property names are auto generated resource keys. We have a resource object like this:

resources: {
  för_lågt_antal_angivet: 'Entered quantity is to low',
  ...
}

In our view models we write code such as
displayError(resources.för_lågt_antal_angivet)

and in our templates we do the same:

{{if quantity < 4}}
  {{:~resources.för_lågt_antal_angivet}}
{{/if}}

Having to use one syntax in the javascript files and an other in the template (sometimes) is unintuitive.
Also developers are more comfortable with the dot notation instead of the index notation to access properties.

What's worse is when we need to change a resource. A developer would just copy/paste the new resource key in the template without considering that a property with non english characters need to be typed using the index notation. And because the change is so small it's also likely that it's not tested. It's just expected to work. In this contrived example there is an {{if}} surrounding the expression making it even more likely that this particular code path is not tested.

My 2 cents is that even though there is a viable workaround for the issue a setting would make this kind of code less error prone since you don't have to remember a special rule and more consistent with how you write code in the javascript viewmodel.

Also a setting like in the example you gave would allow for a small plugin/extension such as jsrender.latin1.js or whatever pre defined charset such that users wouldn't have to manually configure this.

Yes, I am in two minds about doing this, given the test matrix, need to document etc. but FWIW the feature would be like this:

Additional code at line 164

var uW = value && value.addedJsNameChars;
if (uW) {
	uW = "[\\w" + uW + "$";
	rPath = new RegExp("^(!*?)(?:null|true|false|\\d[\\d.]*|(" + uW + "]+|\\.|~(" + uW + "]+)|#(view|(" + uW + "]+))?)(" + uW + ".^]*?)(?:[.[^](" + uW + "]+)\\]?)?)$", "g");
	rParams = new RegExp("(\\()(?=\\s*\\()|(?:([([])\\s*)?(?:(\\^?)(~?" + uW + ".^]+)?\\s*((\\+\\+|--)|\\+|-|~(?!" + uW + "_])|&&|\\|\\||===|!==|==|!=|<=|>=|[<>%*:?\\/]|(=))\\s*|(!*?(@)?[#~]?" + uW + ".^]+)([([])?)|(,\\s*)|(\\(?)\\\\?(?:(')|(\"))|(?:\\s*(([)\\]])(?=[.^]|\\s*$|[^([])|[)\\]])([([]?))|(\\s+)", "g");
}
return value

(Here it is: jsviews.js.txt)

Usage:

$.views.settings.advanced({addedJsNameChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6"});
var data = {foo.someÅpropÄerty:444};
...

Thanks. I'll try it out and will keep an eye out on this thread to see where it ends up.

Ah, I had missed your comment above (the page had not been refreshed when I wrote my response above.)

Yes, I hear you... I'll let it mature a bit before deciding. Meantime, if you can test fully with the update I included above that would be helpful. Let me know if you see any issues/bugs.

I have an alternative which I am looking at. I'll probably send you a new patch to test, in the coming days, rather than testing the version above...

Sounds good. In the meantime I use this quick fix.

Here is an update:
jsviews.js.txt

I managed to figure out a way of allowing all unicode chars, without needing to specifically specify the character codes. The default is not to support unicode chars in names, but you can write

$.views.settings.advanced({unicodeChars: true}); // support all unicode chars in JavaScript names

or else limit to specific ones:

$.views.settings.advanced({unicodeChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6"});

You can't have unicode characters on helpers, or helper properties, tag names or tag properties, etc. Only on properties of data passed in to the link() or render() method.

The generic unicode version, passing true, is done basically by replacing

/[\w]*/

by

/[^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]*/

so word characters are any characters other than the specified ones.

It adds 1% to the filesize (of jsrender.min.js), which I hesitate about, but at least the usage is easy now. (Assuming it works correctly!)

So if you are able to test it , it would be helpful, particularly to make sure it does indeed cover all unicode characters as expected (or at least test with a good range of characters...).

I am still considering whether or not to publish this feature...

I was able to create a test html page. Download here (sorry about the adds and wait time to download)

I extracted every character from the regex.
The only character that appears to be invalid in fact in chrome is "ⸯ" which I have commented out in the html file.

I tested this with the jsviews version you provided. It throws an error.

I will test some more to see if I can isolate which characters are the culprit.

Excellent. That is very helpful!

I did some testing with that, and all the characters seem to work. There was still an error from JsViews because of {{:aⸯa}} still being in the template markup. So I removed it.

Also, of course you need the line:

$.views.settings.advanced({unicodeChars: true});

before calling $.templates(...)

I also had some issues when using a file that did not have the BOM (with UTF-8 encoding).

But with those changes, the template rendered completely...

Here is an updated jsviews.js and jsrender.js and a copy of my version of your test page:

jsviews.js.txt
jsrender.js.txt
testUnicode.html.txt

The test page works for me in Chrome and in Firefox, but not in IE, Edge, Safari... It seems that some of the unicode characters trigger javascript errors in those browsers. With a reduced set of unicode characters it works in all those browsers.

I was at the end of my shift and had been messing back and forth with this test page so I missed the obvious $.views.settings.advanced(...) call :)

Good that you made it work!

So we confirmed that the regex does indeed cover all unicode characters.
The fact that it covers more than some browsers doesn't really matter I think.

Also since the regex does cover all unicode characters I don't see the need for a user to explicitly allow specific characters. Unicode on or off is sufficient. This should also result in a lower footprint for the feature.

If the option is to allow unicode or not then I see two possible implementations, well three.

  1. Making unicode opt-in via $.views.settings.advanced({unicodeChars: true});
    This alternative should ensure that no existing templates are broken by an unexpected bug in the new regex. Also adds more bytes to the final js file.
  2. Somehow allow setting the full rPath and rParams regexes from outside jsviews.js. This would allow you to make unicode support opt-in from a jsviews-unicode.js file moving the overhead completely from jsviews.js itself.
  3. Skip the option alltogether and just change the regex thus making unicode supported out of the box.

What's your take on these ideas?

Yes, I agree with your analysis.

Option 2 is probably my preference.

Option 3 makes me nervous about possible bugs, especially doing it just after releasing v1.0.0 which is hopefully stable.

So option2 is a more conservative approach, given that this feature is unlikely to be used/needed by many users...

I'll post a version of approach 2 soon.

Here is an update where you load the jsrender-unicode.js file after jsrender.js/jsviews.js, before calling $.templates(...) to compile any templates that have unicode characters in them.

Let me know if it works as expected. Thanks!

jsrender-unicode.js.txt
jsrender.js.txt
jsviews.js.txt

I gave it a try but run into some issues with jsrender-unicode.js

The regexes in jsrender-unicode.js is not the same as those we used here
Is this intentional?
Modifying jsrender-unicode.js to use those regexes seem to work (comments left out for abbrevity):

$.views.sub.rPath =
    /^(!*?)(?:null|true|false|\d[\d.]*|([^.^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+|\.|~([\w$]+)|#(view|([\w$]+))?)([^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]*?)(?:[.[^]([^.^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+)\]?)?)$/g;
$.views.sub.rPrm =
  /(\()(?=\s*\()|(?:([([])\s*)?(?:(\^?)(~?[\w$.^]+)?\s*((\+\+|--)|\+|-|~(?![\w$])|&&|\|\||===|!==|==|!=|<=|>=|[<>%*:?/]|(=))\s*|(!*?(@)?[#~]?[^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+)([([])?)|(,\s*)|(\(?)\\?(?:(')|(\"))|(?:\s*(([)\]])(?=[.^]|\s*$|[^([])|[)\]])([([]?))|(\s+)/g;

I also found some left over code and double comments (line numbers from jsrender.js):
line 114:
// subscription, e.g. JsViews integration
line 161:

				var unicodeChars = value && value.unicodeChars;
				if (unicodeChars) {
					setPathRegEx(unicodeChars);
				}

You might want to double check any changes related to this issue.

Ah - sorry, I must have accidentally uploaded an intermediate version. Thanks for pointing out the stale code/comments, too!

So hopefully this is correct now. The regEx expressions are close to yours, but not quite identical. Let me know if you have any concerns, or see any issues.

jsrender-unicode.js.txt
testUnicode.html.txt
jsviews.js.txt
jsrender.js.txt

I finally got to implement the fix in our code. Works perfectly so far!
I'll test it some more and see if anything turns up.

I'll include this in the next update (v1.0.1)

Glad to hear.
No problems so far.

Fixed in commit v1.0.1