LHNCBC / ucum-lhc

LHC implementation of UCUM validation and conversion services

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing restrictions on character validity

mjszczep opened this issue · comments

Per UCUM 2.1.6.1, "The full range of characters 33–126 can be used within a pair of curly braces (‘{’ and ‘ }’)." However, it doesn't look like the validator throws an error against annotations containing characters outside this range (like spaces and extended ASCII characters):

image

Note that extended ASCII characters aren't allowed anywhere in UCUM anyway: "All expressions of The Unified Code for Units of Measure shall be built from characters of the 7-bit US-ASCII character set exclusively." (UCUM 2.1.3.1)

Also, thanks for this tool!

Thanks for bringing that to our attention. Yes, I think our validator is just ignoring everything inside the annotations, but I agree with you that it should be checking to make sure the characters are in the right range. I will add this to our task list.

Oh, rather than rely on "the full range of characters 33-126," it would probably make more sense to rely on the state diagram (looks like the link on that site is broken, but you can also see it here), which specifies the regex [!-z|~]*\}.

For example, UCUM 2.1.6.5 says that "curly braces must not be nested," and true to that, the specified regex does indeed disallow the expression {{} (even though { is char 123 and right now the validator incorrectly allows it). So at least when it comes to characters within curly braces, it would probably be better to just use the regex.

Thanks for the link and for pointing out the regex. I think we probably only need to add such validation for the annotations, since the other strings will be need to valid units anyway.

Yep, agreed. That regex is specific to validating annotations.

+1, I just ran into the same issue today - with a good-looking but non-valid {spec grav} unit which was entered by our lab specialists - adding a validation error would be good improvement.

It is always to good hear that more one that user has the same issue, as it does help us prioritize. Anyway, I was planning to work on this this week, but other tasks on my plate have taken longer than expected. I expect that within a couple of weeks from now, I will have a fix, and maybe within three weeks it will be released. (BTW, If either of you were inclined to dig into the code and submit a pull request, that would be welcome and would speed things up.)

Just FYI, I've started on this, and I have the fix working in my area, but it still might be a week or two before it is out and the website is updated.

Thanks, great!

This came together faster than I expected. ucum-lhc 4.1.8 is published with this fix, and the demo website and the web service APIs are updated.

Thanks, used it to check our units, works as expected

Looks good to me too!