rockt / ChemSpot

ChemSpot is a named entity recognition tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and IUPAC entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. ChemSpot is released under the Common Public License 1.0.

Home Page:https://www.informatik.hu-berlin.de/forschung/gebiete/wbi/resources/chemspot/chemspot/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve match expansion

thuber opened this issue · comments

In addition to #15, improve match expansion in order to not expand matches for terms such as "non-cholesterol"

We could use qGrams statistics to get rid of common suffixes (e.g. -induced)