Double escaping attribute values
jtran opened this issue · comments
The values of attributes are getting double escaped. As far as I can tell, the problem was introduced in f0057e2. In particular, the double-quote character and non-breaking spaces won't ever make it through sanitizing.
In code, I expect the following test to pass.
func TestQuotesSanitization(t *testing.T) {
tests := []test{
{
in: `<p title="""></p>`,
expected: `<p title="""></p>`,
},
{
in: `<p title=" "></p>`,
expected: `<p title=" "></p>`,
},
}
p := UGCPolicy()
p.AllowAttrs("title").OnElements("p")
// These tests are run concurrently to enable the race detector to pick up
// potential issues
wg := sync.WaitGroup{}
wg.Add(len(tests))
for ii, tt := range tests {
go func(ii int, tt test) {
out := p.Sanitize(tt.in)
if out != tt.expected {
t.Errorf(
"test %d failed;\ninput : %s\noutput : %s\nexpected: %s",
ii,
tt.in,
out,
tt.expected,
)
}
wg.Done()
}(ii, tt)
}
wg.Wait()
}
However, I get this output.
=== RUN TestQuotesSanitization
sanitize_test.go:3713: test 1 failed;
input : <p title=" "></p>
output : <p title="&nbsp;"></p>
expected: <p title=" "></p>
sanitize_test.go:3713: test 0 failed;
input : <p title="""></p>
output : <p title="&quot;"></p>
expected: <p title="""></p>
Looks like I introduced a bug when attempting to fix HREF santization: f0057e2
However when I remove the double-sanitizing in that and test the output of the tests I wrote for that code... it doesn't pose a risk.
That is, removing the double escaping still results in sanitized and safe to use output.
As such, I'm going to remove the part of that prior commit that led to the double escaping.