chrisisbeef / jquery-encoder

Contextual Output Encoding for jQuery

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

canonicalize modifies an unencoded string

krinsane opened this issue · comments

In other words, it thinks that a string is encoded when it is actually not and therefore if I do something like

$.encoder.encodeForHTML($.encoder.canonicalize(string)), it gives me a different string

The string in question is something like this: "sdf\sdf\sdf"

Canonicalize transforms it into this: sdf�sdf�sdf

Not sure there is a way around this problem, \s will be considered a control character and will be decoded by canonicalization. Even if you were to do \s and escape the \ it would still be normalized and decoded on the subsequent pass. Is there any other character that could be used in the place of the backslash which is the control character marker for most programming languages? Changing the encoder would allow an attacker to pass control characters using multiple encoding attacks which is less than ideal.

I think a way around this is to provide another API for canonicalize for code. The use case I have is a regex is typed into an input field. So people can choose which canonicalize function to use for values where code is expected. The same encoder is fine.

To continue on my last comment:

Lets say I have a wrapper function encodeForCode

it will have the following:

encodeForCode {
$.encoder.encodeForHTML($.encoder.canonicalizeForCode(string));
}

By it's nature canonicalization is intended to reduce a string to it's simplest form, that is to replace any escaped characters with their character representations so there is only 1 canonicalize function. Not sure I see a use for more than that. I can however see a use-case for allowing customization of the codecs that are used for canonicalization.

So basically you would be able to customise the behavior of canonicalization and what it interprets as a control character.

Like this

function encodeForCode(strInput) {
   $.encoder.encodeForHTML($.encoder.canonicalize({input: strInput, codecs: [ new HTMLEntityCodec(), new PercentCodec() ]});
}

This would eliminate the from being interpreted as a control character and canonicalized as this is a CSS escaping syntax

Hey @stuartf - trying to close the loop on some of these older issues. Does the suggested fix accommodate your requirements?

@nicolaasmatthijs @simong did we work around this somehow, or is it still a problem for oae?

I just ran into the exact same problem today when a legitimate user input string contained backslashes (an attempt to share a windows file path, eg "c:\ext").

Going to look into the above suggestion by @chrisisbeef and will post the outcome.

Avoiding the CSSCodec in the canonicalize function worked for me.

Note to anyone else experiencing this problem:
The example code above by @chrisisbeef is an incomplete hypothetical customization. The current canonicalize function has the codecs var hard-coded to use all 3 codecs. If you want to pass in different codecs as in the example above, the canonicalize function also needs to be modified.