nielstron / quantulum3

Library for unit extraction - fork of quantulum for python3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Very slow performance

EdwardChamberlain opened this issue · comments

commented

Describe the bug
parser.parse() performance excessively slow. Parsing string “That plank is 2 inches long” takes approx 2.65 s on RPi 3B+ measured with perf_counter

To Reproduce
Steps to reproduce the behavior:

  1. import,
  2. parse string "That plank is 2 inches long”

Expected behavior
Snappier performance.

Additional information:

  • Python Version: 3.7
  • Classifier activated/ sklearn installed: Yes
  • OS: Raspberian
  • Version: Latest

Additional context
This delay makes parsing in realtime impossible.

On the first call, the library loads all entities (and in the case of sklearn installed even the neural net) into memory. If you try several parsings after another it should be significantly less than 2 seconds (actually, not noticeable) from the second call onwards.

If not, please contact me again.

Also note taken: Maybe there should be an action to invoke the caching function manually on startup time such that all calls in normal execution are fast.

commented

Youre right - after the first call it is much quicker 0.016 s which is pretty quick!

I am getting another error when running the program, shall I open another issue?

/home/pi/.local/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.21.3 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/pi/.local/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator TfidfVectorizer from version 0.21.3 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/pi/.local/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.linear_model.stochastic_gradient module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
  warnings.warn(message, FutureWarning)
/home/pi/.local/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.linear_model.sgd_fast module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
  warnings.warn(message, FutureWarning)
/home/pi/.local/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator SGDClassifier from version 0.21.3 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
2020-05-28 23:08:30,686 --- The classifier was built using a different scikit-learn version (=0.21.3, !=0.23.1). The disambiguation tool could behave unexpectedly. Consider running classifier.train_classfier()

2020-05-28 23:08:30,686 --- The classifier was built using a different scikit-learn version (=0.21.3, !=0.23.1). The disambiguation tool could behave unexpectedly. Consider running classifier.train_classfier()

This is merely a hint at the package maintainer to rebuild the included classifier with a newer version of scikit-learn. It will be included in the next version (starting a build now). In any case, you can also decide to rebuild the classifier as proposed in the error message or install a matching version of scikit-learn.