microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

Home Page:https://github.com/microcosm-cc/bluemonday

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Detecting when sanitization triggered for an input

jimmiebtlr opened this issue · comments

I'm interested in checking if an input triggered a change in the data (other than whitespace). The idea would be to return an error for that case rather than silently accepting something other than what the user submitted.

Some experimentation with bluemonday shows the whitespace doesn't seem to be preserved so, I'm wondering if there's a way this is feasible?

It wouldn't be ideal, but the nuclear option would probably be to use an allow all policy that I assume would still trigger the whitespace changes.

Any thoughts on how to accomplish this?

commented

Hmm... that is a different use-case.

So the way bluemonday works is by copying the input to the output, and applying a filter as this happens. The filter is the product of the x/html package which will also escape HTML entities, and bluemonday which applies the allowlist to only copy parts allowed. As whitespace in HTML is not significant in the majority of places there is no attempt to preserve it... and bear in mind most text will have been changed anyway due to HTML entities.

It's not going to be possible to determine this with bluemonday.

To come close you'd need to apply the same HTML entities to the input, and normalize whitespace... then you should be able to do a compare.

However it's not the job of this package to preserve input, so even if you make that work today I may break it in future. That is, I've been thinking about "fixing" bad input, so if I receive an open tag without a closing I would add the closing. Would this be considered "silently accepting something other than what the user submitted"? It should be... and that's the heart of this, the bluemonday library always creates new output.

It's not so much changing the input to make output, it's always creating new output from scratch.