mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Home Page:https://bleach.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: `linkify` with `parse_email=True` doesn't handle "%" a "?" in `addr-specs`

larseggert opened this issue · comments

Describe the bug

bug: linkify with parse_email=True doesn't handle "%" and "?", which may occur in RFC822 addr-specs (see https://datatracker.ietf.org/doc/html/rfc2368#section-6)

  • Python Version: 3.10.4
  • Bleach Version: 5.0.0

To Reproduce

Steps to reproduce the behavior:

>>> bleach.linkify("gorby%kremvax@example.com", parse_email=True)
'<a href="mailto:gorby%kremvax@example.com">gorby%kremvax@example.com</a>'

Expected behavior

I expected RFC822 special characters to be percent-encoded according to RFC2368:

>>> bleach.linkify("gorby%kremvax@example.com", parse_email=True)
'<a href="mailto:gorby%25kremvax@example.com">gorby%kremvax@example.com</a>'

Additional context

Same issue exists with "?"; I didn't test other RFC822 special characters but suspect they are similarly left unquoted.

Thank you for the bug report! I'd appreciate a pull request from anyone who wants to tackle this. I don't think I'm going to get to it.

I tried to wrap a urllib.parse.quote() around the the match.group(0) bit in

(None, "href"): "mailto:%s" % match.group(0),

but that seems to have no effect.

commented

I have noticed similar problem with clean() function. Maybe it has the same root cause.

Example:

In [1]: import bleach

In [2]: bleach.clean("<a href='https://example.org?a=1&b=2" target="_blank" rel="nofollow'>example</a>")
Out[2]: '<a href="https://example.org?a=1&amp;b=2">example</a>'

Notice that & is changed to &amp;.

@jozo that's not the same thing. The & should be escaped to &amp;.