apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Return unexpected result when HTML innerText begin with `&#xe`.

October10 opened this issue · comments

commented

To Reproduce

Step by step instructions to reproduce the behavior:

  1. Provide the library the following html <span>&#xe60b;</span>
  2. The library returns

Expected behavior

I expected the library returns (&#xe60b;) or (&amp;#xe60b;)

Describe the bug

Reduce the number of characters, the similar problem will appear when innerText is &#xe.

Details

Version of Node.js:

v15.3.0

Server Operating System:

macOS Catalina 10.15.5

Version of Browser:

Google Chrome: 90.0.4430.93 (It seems that there is no such problem in version 89. )

commented

Words beginning with &#xe are recognized as hexadecimal HTML Entities by htmlparser2, the decodeEntities option is helpful.