mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Home Page:https://bleach.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: linkify with entities inside anchor strings are incorrectly escaped

mcleanmds opened this issue · comments

Describe the bug

linkify on a string with entities inside anchor element text results in the & character of the entity being incorrect escaped to &
e.g.   ->  

** python and bleach versions (please complete the following information):**

  • Python Version: 3.9.5
  • Bleach Version: 6.0.0

To Reproduce

A simple test to reproduce the behavior:

>>> from bleach import linkify
text = r'<p><a href="/">Some&nbsp;entity&rsquo;s</a>More&nbsp;entity&rsquo;s</p>'
expected = r'<p><a href="/" rel="nofollow">Some&nbsp;entity&rsquo;s</a>More&nbsp;entity&rsquo;s</p>'
assert linkify(text) == expected 

Expected behavior

linkify(r'<a href="/">Some&nbsp;entity&rsquo;s</a>')
'<a href="/" rel="nofollow">Some&nbsp;entity&rsquo;s</a>'

Actual behavior

linkify(r'<a href="/">Some&nbsp;entity&rsquo;s</a>')
'<a href="/" rel="nofollow">Some&amp;nbsp;entity&amp;rsquo;s</a>'

Additional context

This bug was introduced in 6.0.0 with the fix for #501 and #692: #692

Thank you for the issue!