nielstron / quantulum3

Library for unit extraction - fork of quantulum for python3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KeyError for consecutive numbers and apostrophe

timkersch opened this issue · comments

Describe the bug
When executing e.g parser.parse("1 '2") a KeyError is raised as follows:

>>> parser.parse("1 '2", verbose=True)
2020-09-30 14:29:53,819 --- Verbose mode
2020-09-30 14:29:53,820 --- Original text: "1 '2"
2020-09-30 14:29:53,824 --- Clean text: "1 '2"
2020-09-30 14:29:53,825 --- Text after numeric conversion: "1 '2"
2020-09-30 14:29:53,825 --- Quantity found: {'value': '1', 'unit1': "'2"}
2020-09-30 14:29:53,826 --- After exponent resolution: 1
2020-09-30 14:29:53,826 ---     Uncertainty: None
2020-09-30 14:29:53,827 ---     Values: [1.0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tike/anaconda3/envs/test-quant/lib/python3.6/site-packages/quantulum3/parser.py", line 447, in parse
    unit, unit_shortening = get_unit(item, text)
  File "/home/tike/anaconda3/envs/test-quant/lib/python3.6/site-packages/quantulum3/parser.py", line 330, in get_unit
    base = dis.disambiguate_unit(unit_surface, text, lang)
  File "/home/tike/anaconda3/envs/test-quant/lib/python3.6/site-packages/quantulum3/disambiguate.py", line 19, in disambiguate_unit
    base = clf.disambiguate_unit(unit_surface, text, lang).name
  File "/home/tike/anaconda3/envs/test-quant/lib/python3.6/site-packages/quantulum3/classifier.py", line 256, in disambiguate_unit
    raise KeyError('Could not find unit "%s" from "%s"' % (unit, text))
KeyError: 'Could not find unit "" from "1 \'2"'

Throwing a keyerror is undesired.

To Reproduce
See above

Expected behavior
A KeyError should not be raised.

Additional information:

  • Python Version: [e.g. 3.6.9]
  • Classifier activated/ sklearn installed: [yes]
  • Version [e.g. 0.7.5]

This error is not raised when scikit-learn is not installed.

Thanks for raising this issue! This will likely be fixed together with #157