microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

Home Page:https://github.com/microcosm-cc/bluemonday

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing Issue handling self closing html tag

sjawaji opened this issue · comments

Looks there is some issue parsing self closing html tags. Not sure if this is a known issue.

package main

import (
	"fmt"
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()
	p.AllowElements("p")

	htmlPayload := `<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body><p>body</p></body></html>`

	body := p.Sanitize(htmlPayload)
	fmt.Println(body)
}

Result:

&lt;/head&gt;&lt;body&gt;&lt;p&gt;body&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;

If I remove the self closing title <title/> tag from the payload, then it is working fine and output is as expected.

<p>body</p>
commented

Garbage in, garbage out applies.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title

Both opening and closing tags are required. Note that leaving off </title> should cause the browser to ignore the rest of the page.

That is indeed what happens, the rest of the page is ignored (as HTML) and thus just escaped to make it safe.

Valid HTML is expected, but if not then bluemonday will make it safe to display within a HTML page (from an XSS perspective).

You should either omit the title tag or close it as per HTML specification.

This behaviour in bluemonday, comes from the go/x/html package but it is correct.