apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Default call should disallow all tags and attributes

hgezim opened this issue · comments

The problem to solve

Looking at a call like sanitizeHtml(userInput), it appears that all html should be sanitized and nothing allowed through (hence, sanitize-html). When the library has implicit defaults it makes me very unsure as a user as to what it'll do.

For example, dev 1 wrote:

sanitizeHtml(variant.product.title, {
    allowedTags: ['br'],
  });

Dev 2 could come later on when we disallow br tags and remove the allowedTags option altogether which would make all the default tags allowed.

The other issues since I know now the library "style", with this code:

sanitizeHtml(variant.product.title, {
    allowedTags: ['br'],
  });

It begs the question, what attributes can br have that might screw me over? Are all of br's attributes allowed? What if the user can set an attribute I don't know about and do something dangerous.

Proposed solution

Change the default API so that nothing is allowed through unless explicitly specified.

sanitizeHtml(userInput) would remove every tag. If I specify an allowed tag, it would allow that tag with 0 attributes.

These snippets would behave the same:

sanitizeHtml(variant.product.title, {
    allowedTags: [],
  });

sanitizeHtml(variant.product.title, {});

I wouldn't feel like I have to watch my back.

Default are great and those lists can be exported as well like so:

import sanitizeHtml, {defaultAllowedTags} from 'sanitize-html';

sanitizeHtml({allowedTags: defaultAllowedTags});

Alternatives

Instead of completely breaking backward compatibility, a strict export could be introduced:

import {sanitizeHtmlStrict, defaultAllowedTags} from 'sanitize-html';

sanitizeHtmlStrict(userInput);

Additional context

The only context was me doing code review and not being certain as to what will take place when allowedTags ends up changing, etc.

For me, it's about the safety. I don't feel safe that this two lines don't produce the same results:

sanitizeHtml(variant.product.title, {
    allowedTags: [],
  });

sanitizeHtml(variant.product.title, {});

I think you could easily end up with a nasty bug if you initially have allowedTags: ['br'] then later on decide to not allow br tags either.

What do you do? The API doesn't make it clear enough that you should just remove the 'br' and keep the empty array.

I see your point, but I also think there's a lot to be said for sensible defaults for the "allow basic HTML" use case.

@abea @stuartromanek what do you think?

Stepping back to the big picture, the module is called 'sanitize-html', not 'strip-html' or 'remove-html' .. it's purpose first is to return a consistent format for HTML. To that end, it's reasonable for the module to have a default set of tags/attrs/etc that it deems 'sanitary'. This would be easier to express if HTML5 had a DTD or something official to root the default definition in, but alas.

I see this more as a documentation issue. @hgezim's use case is directly referenced in the README but overall the README is verbose and frames a lot of scenarios in a less technical, more conversational way. This makes the README hard to read/scan (it takes almost 3,000 words to get the first basic usage example!).

Great points @stuartromanek. I think an API call table as a reference would help that lists required arguments, defaults and options.

Here's an example:
CleanShot 2021-12-03 at 14 56 19@2x

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.