KennethEnevoldsen / asent

Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.

Home Page:https://kennethenevoldsen.github.io/asent/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicated text in `prediction` visualization when there are overlapping spans

tomaarsen opened this issue · comments

Hello!

Bug details

The Token polarity attribute will perform a look-back of (by default) 3 tokens, and the span of the resulting TokenPolarityOutput may thus be larger than the token itself. This causes potential span overlaps, such as in the example below. This is problematic with the current visualize implementation for the prediction style, as it duplicates the overlapping text.

You may have already been aware of this, given #52, but I figured I would make this report regardless.

How to reproduce the behaviour

import asent
import spacy

# load spacy pipeline
nlp = spacy.load("en_core_web_lg")

# add the rule-based sentiment model
nlp.add_pipe("asent_en_v1")

doc = nlp("I am not pretty quite unhappy")
# doc = nlp("I am not great, unhappy is how I would describe myself.")

asent.visualize(doc, style="prediction")

Bugged results

image
image

(See also #58 for a secondary bug related to the second image, i.e. the unhappy section being regarded as positive)

My Environment

  • asent version: 0.4.3
  • spaCy version: 3.4.1
  • Platform: Windows-10-10.0.19043-SP0
  • Python version: 3.10.1
  • Pipelines: en_core_web_lg (3.4.0)

  • Tom Aarsen

You are indeed correct solving #52 should alleviate this issue. However, this parse (as you note) seems problematic.