apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sanitizeHtml throws TypeError on '&' symbol

matejfalat opened this issue · comments

To Reproduce

Step by step instructions to reproduce the behavior:

sanitizeHtml('<p>&</p>')
// or
sanitizeHtml('<p>&nbsp</p>')

Expected behavior

Not to crash.

Describe the bug

When the html text contains the ampersand symbol, the sanitizeHtml() is failing with:

Uncaught TypeError: Cannot read properties of undefined (reading '0')
    at Tokenizer.stateBeforeEntity (Tokenizer.js?6fbd:582:1)
    at Tokenizer.parse (Tokenizer.js?6fbd:818:1)
    at Tokenizer.write (Tokenizer.js?6fbd:158:1)
    at Parser.write (Parser.js?5804:459:1)
    at sanitizeHtml (index.js?5e22:578:1)
    at MaterialPreviewPage (MaterialPreviewPage.tsx?d2f2:41:55)

Details

React: 18.2.0,
Webpack: 5.75.0

Version of Node.js:
v18.13.0

Server Operating System:
Windows 11, WSL2, and Docker

Screenshots

error1

error2

Hi @matejfalat,
I tested this on a mac using the repo tests and could not reproduce the error. Is this occurring only in the browser? I wonder if this is an htmlparser2 issue, rather than a sanitize-html issue since Tokenizer is part of that package. I guess we would need some minimal project set-up to replicate this error.
Cheers

Yes, what we would ask is that you contribute a PR with a failing unit test to this repo so we can see how this is possible in the context of this project and avoid any confusion with issues that might only exist in a larger project with parts that aren't actually dependencies of the module etc. htmlparser2 is a dependency so browser or no, a bug coming from that should be reproducible in a test.

probably because of missing ";" at the end