Abbreviation detection not working where short form contains a space followed by digits

Question

Abbreviation detection not working where short form contains a space followed by digits

ICLRandD opened this issue 5 years ago · comments

The current implementation of the AbbreviationDetector() does not handle abbreviations that contain a short form followed by a space followed by a number

For example, in this scenario:

The Proceeds of Crime Act 2002 ("PoCA 2000")

The abbreviation is not matched.

The original implementation in scispaCy does not appear to have been built to handle instances in which the short form is bounded by quote marks).

Phil Gooch · Answer 1 · Tue Dec 31 2019 00:16:19 GMT+0800 (China Standard Time)

You might be interested in an alternative Python implementation of Schwartz-Hearst which handles this scenario.

https://github.com/philgooch/abbreviation-extraction

E.g.

pip install abbreviations

In [1]: from abbreviations import schwartz_hearst                                                                                                  

In [2]: schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='The Proceeds of Crime Act 2002 ("PoCA 2002")')                             
Out[2]: {'PoCA 2002': 'Proceeds of Crime Act 2002'}