Using strip_tags for real-world html sanitization is vulnerable to circumvention.

Question

Using strip_tags for real-world html sanitization is vulnerable to circumvention.

scryptonite opened this issue 7 years ago · comments

Description

As the title says, it is possible to circumvent the purpose of strip_tags by crafting a string so that the final output still contains uncensored HTML. This differs with the behavior of the same function in PHP-land, which seems to always guarantee that unpermitted html tags are removed.

Example:

const strip_tags = require("locutus/php/strings/strip_tags");


let treat = strip_tags('<script>console.log("everything is fine")</script>');
console.assert(treat == 'console.log("everything is fine")'); 
// > true
// Worked exactly as intended...


let trick = strip_tags('<<foo>script>console.log("all your base are belong to us")<</foo>/script>');
console.assert(trick == 'console.log("all your base are belong to us")'); 
// > false! 
// It would be dangerous and unwise to put the contents of `trick` in browser-land without 
//   doing something else to the string.

I actually discovered and (ab)used this technique against a twitch overlay that was attempting to sanitize chat messages before displaying them on stream... Their 'fix' at the time was to remove all < characters from chat messages being displayed. 💔

I think to reach parity with how strip_tags works in PHP the function will need to recursively strip tags until there is nothing left to remove. I might also recommend adding a comment or remark somewhere that educates unwitting users that the htmlentities function might be better suited for their sanitization needs in browser-land.

🎃

Rafał Kukawski · Answer 1 · Wed Nov 01 2017 14:41:30 GMT+0800 (China Standard Time)

Thanks for reporting the issue. I'll try to fix it. Currently the function relies little too much on regular expressions, which are not well suited for HTML markup.