Parsing Issue handling self closing html tag
sjawaji opened this issue · comments
Looks there is some issue parsing self closing html tags. Not sure if this is a known issue.
package main
import (
"fmt"
"github.com/microcosm-cc/bluemonday"
)
func main() {
p := bluemonday.NewPolicy()
p.AllowElements("p")
htmlPayload := `<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body><p>body</p></body></html>`
body := p.Sanitize(htmlPayload)
fmt.Println(body)
}
Result:
</head><body><p>body</p></body></html>
If I remove the self closing title <title/>
tag from the payload, then it is working fine and output is as expected.
<p>body</p>
Garbage in, garbage out applies.
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title
Both opening and closing tags are required. Note that leaving off </title> should cause the browser to ignore the rest of the page.
That is indeed what happens, the rest of the page is ignored (as HTML) and thus just escaped to make it safe.
Valid HTML is expected, but if not then bluemonday will make it safe to display within a HTML page (from an XSS perspective).
You should either omit the title
tag or close it as per HTML specification.
This behaviour in bluemonday, comes from the go/x/html package but it is correct.