apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sanitizing general purpose text - Ampersand encoding and '<' or '>'

grapevinegizmos opened this issue · comments

Hi there, I'm trying to use the sanitizer to make sure that general purpose text entered in an angular form contains either no tags or just styling tags like p or i

I do this by comparing the value of an input field to the value produced after I sanitize the text. If original==sanitized, I allow the text, if not then I mark the input box as having an error and prevent posting.

This works fine so long as the user does not use the characters '<', '>' (except in the permitted tags) or '&' anywhere in the text because I see that the sanitizer converts these characters to &lt, &gt or &amp, which causes the test to fail.

So text like "The food and Smith & Jones leaves much to be desired", or "If tickets sold is > 100, then buy more tickets", fails the test.

Is there a way to avoid this behavior?

Escaping entities produces correct HTML and no problems when rendering. You could submit a PR to optionally only escape & when it could be mistaken for an entity reference (note there are many ways those can be formed), or just use a separate tool to replace those in the conditions you deem safe after using sanitize-html.

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.