Detection of successive empty sections

Question

Detection of successive empty sections

paul-bssr opened this issue 2 years ago · comments

Description

Using the beta pipeline "eds.sections", I encountered a bug in detecting sections preceded by an empty section. Indeed, if a section is preceded by an empty section, it is not detected and its content is labeled as belonging to the first section.

Empty sections are quite common in documents, so this could lead to errors in section labelling of entities.

How to reproduce the bug

For instance, in the following example, sections "Antécédents :" and "Conclusion" are not distinguished. Therefore, all the content of "Conclusion" section is tagged as "Antécédents".

import spacy

nlp = spacy.blank("eds")

nlp.add_pipe("eds.sentences")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sections")

# Definition matcher
regex = dict(
    # Myolyse
    rhabdomyolyse = "rhabdom[yi]ol[yi]se",
    myolyse = "m[yi]ol[yi]se"
)

nlp.add_pipe("eds.matcher", 
             config = dict(
                 regex=regex, 
                 attr="NORM",
                 ignore_excluded=True,
             ),
            )

text = """
Antécédents : 
Conclusion : 
Patient va mieux

Au total:
sortie du patient
"""

doc.spans["sections"]

Your Environment

Operating System:
Python Version Used: 3.10.0
spaCy Version Used: 3.4.1
EDS-NLP Version Used: 0.6.1
Environment Information:

Perceval Wajsburt · Answer 1 · Mon Aug 08 2022 21:34:46 GMT+0800 (China Standard Time)

Thank you for letting us aware of this ! I'll look into it

Perceval Wajsburt · Answer 2 · Tue Aug 09 2022 00:33:02 GMT+0800 (China Standard Time)

PR #114 should solve this, thanks again for the issue !

While waiting for the next release, you can install from the master branch directly with

pip install git+https://github.com/aphp/edsnlp@master

paul-bssr · Answer 3 · Tue Aug 09 2022 16:08:35 GMT+0800 (China Standard Time)

Thanks a lot for the super quick solve !