nielstron / quantulum3

Library for unit extraction - fork of quantulum for python3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trying to parse inches using double-quote symbol throws ImportError about stemming

adam-funk opened this issue · comments

Describe the bug
Trying to parse a string containing a double-quote at the end of a number (meaning inches) throws an ImportError about a stemming requirement

To Reproduce

>>> import quantulum3
>>> from quantulum3 import parser as qp
>>> qp.parse('supplied with 3.5" guidewire')
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    qp.parse('supplied with 3.5" guidewire')
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/parser.py", line 450, in parse
    unit, unit_shortening = get_unit(item, text)
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/parser.py", line 328, in get_unit
    base = dis.disambiguate_unit(unit_surface, text, lang)
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/disambiguate.py", line 18, in disambiguate_unit
    base = clf.disambiguate_unit(unit_surface, text, lang).name
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/classifier.py", line 258, in disambiguate_unit
    transformed = classifier(lang).tfidf_model.transform([clean_text(text, lang)])
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/classifier.py", line 100, in clean_text
    return _get_classifier(lang).clean_text(text)
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/_lang/en_US/classifier.py", line 24, in clean_text
    raise ImportError("Module stemming is not installed.")
ImportError: Module stemming is not installed.
>>> qp.parse('supplied with 3.5 inch guidewire')
[Quantity(3.5, "Unit(name="inch", entity=Entity("length"), uri=Inch)")]

Expected behavior

  1. not throwing the exception
  2. identifying number 3.5
  3. ideally, identifying the unit as inch

Additional information:

  • Python Version: 3.8.10 in anaconda (tested in idle3 and jupyter notebook)
  • Classifier activated/ sklearn installed: [yes/no]
  • OS: Ubuntu 21.04
  • quantulum3 0.7.9
  • sklearn 0.24.2
  • scipy 1.7.1
  • numpy 1.20.3

Is the issue resolved by installing stemming? Not sure if it is not actually required by the package.

It fixes the exception, thanks! The unit isn't always right, but I see that it's using context.

>>> import stemming
>>> import quantulum3
>>> from quantulum3 import parser as qp
>>> qp.parse('supplied with 3.5" guidewire')
[Quantity(3.5, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)")]
>>> qp.parse('supplied with 3.5" long guidewire')
[Quantity(3.5, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)")]
>>> qp.parse('supplied with 3.5" wide guidewire')
[Quantity(3.5, "Unit(name="inch", entity=Entity("length"), uri=Inch)")]

Closing this now :)