apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow tags in tags

cgorrieri opened this issue · comments

The problem to solve

I would like to add a restriction on the content of a tag. For example, only allow BR tags in P or LI elements

Proposed solution

In allow tags, extend the list type of allow tags to a nested list with possible configurations like:

allowedTags: [
  'ul',
  ['li', { allowedTags: ['b', 'span', 'u', ...]}],
  ...
]

Alternatives

  1. We can use filters to parse the content of specific elements with a different parser. But it can conflict with the parent parser as the content will be parsed regardless by the main parser
  2. We can maybe update transformTags to get the text content and have the possibility to transform the content. But again, it can conflict with the main parser.
commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This could be useful to clean nested elements, let's keep it open

@cgorrieri Did you manage to solve this problem? Given

<p>
  <p>
    Some incorrectly nested text
  </p>
</p>

I'd love to be able to get

<p>
  Some incorrectly nested text
</p>

After sanitizing the HTML string.

The proposal is a good one. Not 100% sure about the syntax, I'm curious what @BoDonkey thinks.

This isn't a requirement we have in-house so I'll reopen it and tag it contributions welcome.

In my case I ended up doing a regex before parsing to remove the tags I didn't want nested.

I like the idea of this new feature. IMO, whoever tackles it will need to be careful with both backward compatibility and not requiring the developer to utilize the feature. So, it should not be a list of defaults and the ability to modify those defaults, as is currently implemented for allowedTags. It just seems like there are too many edge cases to provide sensible defaults.

The existing syntax must continue to work, yes.