Improve match expansion
thuber opened this issue · comments
In addition to #15, improve match expansion in order to not expand matches for terms such as "non-cholesterol"
We could use qGrams statistics to get rid of common suffixes (e.g. -induced)
ChemSpot is a named entity recognition tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and IUPAC entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. ChemSpot is released under the Common Public License 1.0.
https://www.informatik.hu-berlin.de/forschung/gebiete/wbi/resources/chemspot/chemspot/
thuber opened this issue · comments
In addition to #15, improve match expansion in order to not expand matches for terms such as "non-cholesterol"
We could use qGrams statistics to get rid of common suffixes (e.g. -induced)