LHNCBC / ucum-lhc

LHC implementation of UCUM validation and conversion services

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unexpected results from validating or conversion of unit expressions

incansvl opened this issue · comments

Note- these results were taken from the interactive demo available at https://ucum.nlm.nih.gov/ucum-lhc/demo.html. That page links to here so I assume the version of code running at that URL is essentially the same as hosted here.

Results from running certain test cases through the tool were not as expected. Note these were deliberately chosen to be "challenging " for UCUM given some if it's characteristics (notably not using the same base units as SI) but even so some of these results look wrong.

`

Item Operation Input Output (Result) Comment
1 validate (mL/L).[pH] ([pH]*[pH]) is a valid unit expression mL/L is unitless (dimension of 1), and pH is unitless in the SI system (but a complex function in UCUM). It is unclear how mL/L is converted to a unit of “pH” in the parsing of this expression.
2 Convert to kg mmol/mol .000001000000 kg (kilogram) Moles are unitless in UCUM, so an expression involving multiplying or dividing Moles should also be unitless. It should not be possible to convert such a  value to a result involving mass (kg).
3 Convert to mol mmol/L 1.00 mol (mole) As Moles are unitless, this value should have a dimension of 1/volume, e.g. L-1
4 Convert to mol 32.4 (ug/g).mg 0.00000000 mol (mole) Incorrect result dimension (should be mass)
5 Convert to mol 78.2 (mmol/L)/s 78.20 mol (mole) Incorrect result dimension (should be L-1s-1)

`

Thanks for reporting this problem. You are correct that this is the code running at https://ucum.nlm.nih.gov/ucum-lhc/demo.html, and I have confirmed the problem. It seems to have trouble with dimensionless units for some reason. I will see if we can prioritize getting this fixed. In the meantime, we also have another tool which is a web API for validating and converting UCUM, which does not seem to have this problem. (I tried a couple of the cases). That tool is documented here: https://ucum.nlm.nih.gov/ucum-service.html.

I agree that dimensionless units seem to be the trigger for the issue. However i'm not sure the web service is entirely immune. E.g if we take the last example-

curl -H "Accept: text/plain"  https://ucum.nlm.nih.gov/ucum-service/v1/ucumtransform/from/(78.2).(mmol/L)/s/to/L-1.s-1

Error: Source and Target unit do not seem to belong to the same property

Also- If the demo web pages and the web service don't follow identical logic when performing validations and conversions, which one of these code paths would be used if client code was calling the API directly?

The demo web page is a demo of the ucum-lhc library (https://github.com/lhncbc/ucum-lhc) which a is completely different code base than the web API service, which was donated to us by Jozef Aerts (of http://www.xml4pharma.com/). We had some thought of rewriting the API service to use our ucum-lhc library, but that would probably not be in the near term.

The issue you found with the web API service is by design. It deviates from the UCUM standard by adding two new base units, mol and IU (Jozef's decision, though I am inclined to agree with it). See the notes on "Conversions using moles" on the page https://ucum.nlm.nih.gov/ucum-service.html. I think the motivating example for this change is that one would not want "mol/L" to covert to "/L".

Yesterday we pushed out a new version of the ucum-lhc library with fixes for these issues and updated the demo website, so these problems should be resolved. Thanks again for reporting them. If you still see problems, feel free to reopen or start a new issue.

Thanks for the fast response, much appreciated.

This does raised some questions-

I had not realised that the demo web page and the web service were running different code bases. Is the code for the web service is publicly available for review?

Re-

It deviates from the UCUM standard by adding two new base units, mol and IU (Jozef's decision, though I am inclined to agree with it).

  1. It is my understanding from reading the UCUM license terms that changing the specifications in the unit table, including adding, deleting, or modifying individual units is specifically prohibited. The only modification allowed is localisation of names.

On the specifics-
1a) I would agree with adding mol as a base unit. Ideally a variant of UCUM that is fully SI-compliant would be created, but as a starting point re-adding the SI base units that have been supplanted in UCUM would help.

1b) I think i'd disagree with making IU a base unit. In fact IU are potentially problematic and need a "here br dragons" flag, but as a minimum they need to be treated as arbitrary , which means they affectively belong to their own system of units. Even this may not be enough to fully deal with them however.

However points 1a and 1b are moot if my reading of the UCUM license terms is correct.

The web API service code is not public. We do not usually make code that is for a web application running on one of our servers public because we don't want to provide clues for hackers to find vulnerabilities in our server-side applications.

If we made public our version of the UCUM data file with the change to make mol and IU base units, then at that point we might be out of compliance with the license, as you point out, though I am not sure of that. However, since the change is internal to the behavior of our web application, I do not see an issue, except that the documentation for the web service (https://ucum.nlm.nih.gov/ucum-service.html) should probably be revised to make that change more apparent. I do not see it mentioning the change for IU.

I am not very familiar with the IU unit. Are there problems with it that are specifically introduced by the change to make it a base unit?

Re. Licensing I'm not a lawyer, but I was basing my comment on the following clause in the Terms of Use-

2) Users shall not modify the Licensed Materials **and** may not distribute modified versions of the UCUM table (regardless of format) or UCUM Specification. Users shall not modify any existing contents, fields, description, or comments of the Licensed Materials, and may not add any new contents to it.

The bolding of the "and" is mine, but I read this as saying that users are prohibited BOTH from modifying the Licensed Materials, AND SEPARATELY prohibited from distributing the modified materials. This may or may not be enforceable, but I think that is what it's trying to say.

Re. International Units, it is a complex area that i'm slowly getting to grips with.

Firstly, being a base unit is a characteristic of a unit WITHIN a system of units. So for example kg are a base unit in the SI system, but not in the cgs system. Derived units within the system are specified in terms of the base units.

Conversely arbitrary units (units with the arbitrary flag set in the UCUM database) are not members of a system at all, or instead you can consider them a member of a unique system with only one member (themselves). For example Bodansky Units [BDU] are arbitrary, they are in their own little system and can't be converted to/from any other unit.

IUs are arbitrary, so making them a base unit is redundant, because there are no other members of the same system that could be converted to/from IUs.

The reason I said IUs are scary (and the same applies to the unit called "Unit", where just the name is a nightmare...) is because they are "super arbitrary". Not only are they arbitrary, but the same name is used to describe multiple different tests that have (as I understand it) unrelated scales! I haven't fully worked out the implications of this, but it would seem to mean dimensional analysis involving them is unsafe, because multiple instances if "IU" in an analysis can't be assumed to be equivalent.