rockt / ChemSpot

ChemSpot is a named entity recognition tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and IUPAC entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. ChemSpot is released under the Common Public License 1.0.

Home Page:https://www.informatik.hu-berlin.de/forschung/gebiete/wbi/resources/chemspot/chemspot/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tagging text is slow

rockt opened this issue · comments

934a481

    public List<Mention> tag(String text) throws UIMAException {
        JCas jcas = JCasFactory.createJCas(typeSystem);
        jcas.setDocumentText(text);
        PubmedDocument pd = new PubmedDocument(jcas);
        pd.setBegin(0);
        pd.setEnd(text.length());
        pd.setPmid("");
        pd.addToIndexes(jcas);
        return tag(jcas);
    }

This is slow since a jcas is initialized each time we want to tag a string. Instead, hold back one pre-intitialized jcas and reset it each time this method gets called.

Not quite that easy if we want to allow threading for this method (which seems sensible to me). Several threads cannot work on the same JCas, so we must either make it thread-safe or synchronized and thereby preventing any multithreading.