nielstron / quantulum3

Library for unit extraction - fork of quantulum for python3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Include/Exclude default entities and units to load

sidfeiner opened this issue · comments

Is your feature request related to a problem? Please describe.
When I want to use the default quantities and units, but parsing a big bulk of text takes a lot of time when all those default units are loaded.
Also including or excluding prefixes as some of them are not relevant in some use cases and will cause false positives, for example "atto" that catches attometres and the AM code, when AM could simply be part of a timestamp

Describe the solution you'd like
The ability to include/exclude default entities and units that would be loaded from the default entities.json and units.json files

Hi,

Similar experience for me. Quantulum is working really well for us (thanks by the way!) but to get the most out of it we needed to tweak some of the surfaces and symbols of existing units. I couldn't find a way to do this without cloning the repo and putting up my own package on our pypi server.

This also extends to the classifier model, there doesn't seem to be a way to load in a model that isn't held in the repo source code.

This isn't a problem once, but if we extend to other datasets and need similar tweaks we'll quickly find ourselves in a muddle and makes feeding back code improvements a bit more difficult (because I now have a fork of this Quantulum on my employers Bitbucket server).

I'm happy to have a look at some changes if someone can point me in the right direction.

Both custom classifier and custom unit.json should be doable. They are currently being loaded from hardcoded positions within the package, but enabling passing in custom locations is something I would appreciate too but never got around to do.

Essentially you just have to check where the "units.json" and classifier binary are loaded and add arguments to parse/extract and all recursively called functions that lets you specify a custom "units.json" or classifier binary (note: if a custom units file is given and the classifier enabled, a custom classifier likely has to be passed in)