nielstron / quantulum3

Library for unit extraction - fork of quantulum for python3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does not segment units without numbers

averbitsky opened this issue · comments

Describe the bug
The quantulum3 Python library does not correctly segment units without numbers. It should be able to parse the unit with or without a number. When parsing a string with a unit but without a number, it returns an empty list. When parsing a string with a number and a unit but with incorrect formatting, it returns an incorrect result. The library only returns the expected result when parsing a string with the correct number and unit formatting.

To Reproduce

  1. Run the following code:
from quantulum3 import parser
quants = parser.parse('intake (g/day)')
print(quants)

Observe that the output is an empty list [].

  1. Run the following code:
quants = parser.parse('intake 2 (g/day)')
print(quants)

Observe that the output is incorrect: [Quantity(2, "Unit(name="dimensionless", entity=Entity("dimensionless"), uri=Dimensionless_quantity)")].

  1. Run the following code:
quants = parser.parse('intake (2 g/day)')
print(quants)

Observe that the output is as expected: [Quantity(2, "Unit(name="gram per day", entity=Entity("mass flow"), uri=None)")].

Expected behavior
The library should be able to parse units without numbers and return the correct result. In the first example, the expected output should be a quantity with the correct unit (e.g., gram per day) but without a specified number.

Screenshots
N/A

Additional information:
Python Version: 3.8.16
Classifier activated/ sklearn installed: Yes
OS: macOS
Device: Mac Apple M1 Pro
quantulum3 Version: 0.8.1

Additional context
This issue occurs when trying to parse units without numbers or with incorrect formatting. The library should be more flexible and robust in handling such inputs.

quants = parser.parse('intake 2 (g/day)')

This looks like an interesting case that we might want to handle correctly. Do you have a source document where you took this from?

quants = parser.parse('intake (g/day)')

This is currently not covered by quantulum and would be a bigger change. Quantulum currently focuses on numbers and if possible tries to attach a unit to it to create a quantity. What you are proposing is that also units without numbers could be returned. This is certainly interesting.