Duplicated text in `prediction` visualization when there are overlapping spans
tomaarsen opened this issue · comments
Hello!
Bug details
The Token
polarity
attribute will perform a look-back of (by default) 3 tokens, and the span of the resulting TokenPolarityOutput
may thus be larger than the token itself. This causes potential span overlaps, such as in the example below. This is problematic with the current visualize
implementation for the prediction
style, as it duplicates the overlapping text.
You may have already been aware of this, given #52, but I figured I would make this report regardless.
How to reproduce the behaviour
import asent
import spacy
# load spacy pipeline
nlp = spacy.load("en_core_web_lg")
# add the rule-based sentiment model
nlp.add_pipe("asent_en_v1")
doc = nlp("I am not pretty quite unhappy")
# doc = nlp("I am not great, unhappy is how I would describe myself.")
asent.visualize(doc, style="prediction")
Bugged results
(See also #58 for a secondary bug related to the second image, i.e. the unhappy
section being regarded as positive)
My Environment
- asent version: 0.4.3
- spaCy version: 3.4.1
- Platform: Windows-10-10.0.19043-SP0
- Python version: 3.10.1
- Pipelines: en_core_web_lg (3.4.0)
- Tom Aarsen
You are indeed correct solving #52 should alleviate this issue. However, this parse (as you note) seems problematic.