nielstron / quantulum3

Library for unit extraction - fork of quantulum for python3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add recognition for <prefix><unit_surface> combinations

nielstron opened this issue · comments

Is your feature request related to a problem? Please describe.
As pointed out in a seperate issue, the combination "<unit_surface>" (i.e. "Mlitre") is not uncommon. Therefore support for its recognition would be desirable.

An example for the usage of such notation:
https://www.climate-policy-watcher.org/water-quality/v-1.html

Additional context
https://github.com/nielstron/quantulum3/pull/162/files#r533194554

Some quick test in branc feature-prefix-surface-recognition reveals that a bit more fine tuning will be needed:

currently, it is also allowed to mix symbolic and non-symbolic units without spaces. This yields "metre litre per day" for the term "Mlitre/day" from the example above. This only partly gets resolved later by a manual override to convert "M" and "k" to 10^3/10^6 multipliers.
The overall result (15*10^6 litre/day) is only partially desired, as we usually do not want to do unit conversion, which implicitely happens here.