Duplicated text in `prediction` visualization when there are overlapping spans

Question

Duplicated text in `prediction` visualization when there are overlapping spans

tomaarsen opened this issue 2 years ago · comments

Hello!

Bug details

The Token polarity attribute will perform a look-back of (by default) 3 tokens, and the span of the resulting TokenPolarityOutput may thus be larger than the token itself. This causes potential span overlaps, such as in the example below. This is problematic with the current visualize implementation for the prediction style, as it duplicates the overlapping text.

You may have already been aware of this, given #52, but I figured I would make this report regardless.

How to reproduce the behaviour

import asent
import spacy

# load spacy pipeline
nlp = spacy.load("en_core_web_lg")

# add the rule-based sentiment model
nlp.add_pipe("asent_en_v1")

doc = nlp("I am not pretty quite unhappy")
# doc = nlp("I am not great, unhappy is how I would describe myself.")

asent.visualize(doc, style="prediction")

Bugged results

(See also #58 for a secondary bug related to the second image, i.e. the unhappy section being regarded as positive)

My Environment

asent version: 0.4.3
spaCy version: 3.4.1
Platform: Windows-10-10.0.19043-SP0
Python version: 3.10.1
Pipelines: en_core_web_lg (3.4.0)

Tom Aarsen

Kenneth Enevoldsen · Answer 1 · Fri Aug 26 2022 16:52:58 GMT+0800 (China Standard Time)

You are indeed correct solving #52 should alleviate this issue. However, this parse (as you note) seems problematic.