use quantulum to remove quantities from string
AxelStbl opened this issue · comments
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
I would like to use Quantulum to remove content not only extract it.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Basically I want to get the list of surfaces found so I can remove them from the text like I in the modification I did here.
###############################################################################
def parse(text, lang="en_US", verbose=False):
"""
Extract all quantities from unstructured text.
"""
log_format = "%(asctime)s --- %(message)s"
logging.basicConfig(format=log_format)
if verbose: # pragma: no cover
prev_level = logging.root.getEffectiveLevel()
logging.root.setLevel(logging.DEBUG)
#_LOGGER.debug("Verbose mode")
orig_text = text
#_LOGGER.debug('Original text: "%s"', orig_text)
text = clean_text(text, lang)
values = extract_spellout_values(text, lang)
text, shifts = substitute_values(text, values)
quantities = []
surfaces = []
for item in reg.units_regex(lang).finditer(text):
groups = dict([i for i in item.groupdict().items() if i[1] and i[1].strip()])
#_LOGGER.debug(u"Quantity found: %s", groups)
try:
uncert, values = get_values(item, lang)
unit, unit_shortening = get_unit(item, text)
surface, span = get_surface(shifts, orig_text, item, text, unit_shortening)
surfaces.append(surface)
objs = build_quantity(
orig_text, text, item, values, unit, surface, span, uncert, lang
)
if objs is not None:
quantities += objs
except ValueError as err:
_LOGGER.debug("Could not parse quantity: %s", err)
if verbose: # pragma: no cover
logging.root.setLevel(prev_level)
return quantities, surfaces
AFAIK each returned quantity has an attribute "surface" that contains its surface in the passed string. Maybe you can use that instead? Maybe even the exact start and end of the surface (two indices)
Thanks indeed it works like this!