microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

Home Page:https://github.com/microcosm-cc/bluemonday

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

<a> tags in tables not matched correctly

matloob opened this issue · comments

Here's a program I ran: https://go.dev/play/p/iT5Rcoe1XRz copied below


import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	text := `
<a name="foo">
<table>
<tr>
<td><a href="page">text</a></td>
</tr>
</table>
</a>
`

	p := bluemonday.UGCPolicy()
	s := string(p.SanitizeBytes([]byte(text)))
	fmt.Println(s)
}

The output is



<table>
<tr>
<td><a href="page" rel="nofollow">text</td>
</tr>
</table>
</a>

But I would expect the output to be



<table>
<tr>
<td><a href="page" rel="nofollow">text</a></td>
</tr>
</table>

That is, the closing a tag in the td should be matched to the starting a tag in the td, but instead, the ending a tag in the text is matched to it.

commented

this doesn't feel like it is valid nor should work.

the a element is not allowed to be a descendent of another a element, and that is what you've done by wrapping the table in the outer <a name="foo"> (which btw is also deprecated/obsolete and you should use id instead of name).

by nesting the anchors you've broken the little state machine that keeps track of the tags opened and closed, and the last open anchor is preserved, as is the last close anchor.

to me, whilst yes it breaks the state machine, this isn't valid input... and bluemonday is GIGO (garbage in, garbage out) and so I wouldn't look to fix this.

that said, if the error can be reproduced on valid input then it would be a problem that I should fix.

I used https://validator.w3.org/nu/#textarea to validate the input as I did feel that nesting anchors shouldn't work (in addition to wrapping a table with an anchor)... as the only used of a name or id anchor is to navigate to the top of it, why not just close that immediately?