Specify an order for preferred interpretations as low-key disambiguation
EdwardChamberlain opened this issue · comments
Describe the bug
The shorthand notation of inch (“) is detected but is parsed as second. While technically true the more common use of “ is to mean inch.
To Reproduce
Steps to reproduce the behaviour:
>>> from quantulum3 import parser
>>> s = 'Its about 24" long'
>>> quants = parser.parse(s)
>>> print(s)
Its about 24" long
>>> print(quants)
[Quantity(24, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)”)]
Expected behavior
I would expect it to default to “ meaning inches rather than seconds.
Screenshots
N/A
Additional information:
- Python Version: 3.7
- Classifier activated/ sklearn installed: yes
- OS: linux
- Version: 0.7.3
Additional context
Is there anyway to force an override on this?
the more common use of “ is to mean inch.
Do you have some source for this claim? Since this tool should be as general as possible I would prefer to keep all ambiguous units random when not using the disambiguation system.
Relatedly, for disambiguation there is a pretrained classifier included in the system. However, without any context, it is not really likely that it will correctly determine the appropriate unit.
Note taken: A way to pass in an ordering for preferred/less preferred interpretations of some symbols could be included.
Do you have some source for this claim?
Sure:
The inch (abbreviation: in or ″)
Source: https://en.wikipedia.org/wiki/Inch (first line)
It seems to sometimes pickup inch correctly if using “ but I'm not sure how to reproduce yet.
Yes of course " is an abbreviation for inch, but I rather wanted to know whether there is a source for the claim that " more frequently means "inch" than "second" :)
The tool knows that " is an abbreviation for inch just as it knows that " is an abbreviation for seconds, however there is no order of preference of which interpretation to choose. If it picks up " as inch it might be related to the context of the sentence but also due to (something very close to) pure luck, especially if disambiguation is not enabled.