OWASP / owasp-java-encoder

The OWASP Java Encoder is a Java 1.5+ simple-to-use drop-in high-performance encoder class with no dependencies and little baggage. This project will help Java web developers defend against Cross Site Scripting!

Home Page:https://owasp.org/www-project-java-encoder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encode.forJavaScript()

ndp-opendap opened this issue · comments

Greetings,

When I do the following:

    String val = "CF-1.5";
    String jsVal = Encode.forJavaScript(val);

The ending value of jsVal is "CF\-1.5". Based on the javadoc for the Encode class here:

https://owasp.github.io/owasp-java-encoder/encoder/apidocs/index.html?index-all.html

I believe that Encode.forJavaScript() is incorrectly/needlessly escaping the "-" character.

The javadoc does not list "-" as a character to be escaped, and the RFC for JSON:

https://tools.ietf.org/html/rfc7159#section-7

Does not list the "-" character as one that is included in the list of "two-character sequence escape representations of some popular characters".

It seems that since any character can be escaped there is no issue in escaping the "-" character, but I do not think there is a two character shorthand "\-" supported and that the hex escaped version must be used.

Is this a known problem? Has no one else had their JSON parser throw a rod because of this unsupported escape sequence?

Thanks,

Nathan

Perhaps the application of this rule?

if (mode == Mode.BLOCK || mode == Mode.HTML) {
        // in <script> blocks, we need to prevent the browser from seeing
        // "</anything>" and "<!--". To do so we escape "/" as "\/" and
        // escape "-" as "\-". 

I think so.

Sorry for the spam above, but I agree with the original post by Nathan and with the analysis by @pthorson.

https://github.com/OWASP/owasp-java-encoder/blob/789a380/core/src/main/java/org/owasp/encoder/JavaScriptEncoder.java#L128

The encoding of raw - as raw \- results in an invalid JSON (but a valid ECMAScript) string literal.

string = quotation-mark *char quotation-mark

      char = unescaped /
          escape (
              %x22 /          ; "    quotation mark  U+0022
              %x5C /          ; \    reverse solidus U+005C
              %x2F /          ; /    solidus         U+002F
              %x62 /          ; b    backspace       U+0008
              %x66 /          ; f    form feed       U+000C
              %x6E /          ; n    line feed       U+000A
              %x72 /          ; r    carriage return U+000D
              %x74 /          ; t    tab             U+0009
              %x75 4HEXDIG )  ; uXXXX                U+XXXX

      escape = %x5C              ; \

      quotation-mark = %x22      ; "

      unescaped = %x20-21 / %x23-2E / %x30-5B / %x5D-10FFFF

https://www.rfc-editor.org/errata_search.php?rfc=8259

CharacterEscapeSequence ::
    SingleEscapeCharacter
    NonEscapeCharacter

SingleEscapeCharacter :: one of
    ' " \ b f n r t v

NonEscapeCharacter ::
    SourceCharacter but not one of EscapeCharacter or LineTerminator

https://www.ecma-international.org/ecma-262/6.0/#sec-literals-string-literals
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#Escape_notation
https://www.json.org/

There is little need to have the "mode" parameter in the Javascript string encoder. It brings unnecessary usage complexity as encoding for the worst case would still be parseable in the easiest use case as well.

In addition, the existing code missed extra attack surfaces,

https://security.stackexchange.com/questions/11091/is-this-json-encoding-vulnerable-to-cdata-injection/11097#11097

This suggests to encode with raw \uHHHH the following single characters, regardless of "mode" (and add another method signature without the parameter, deprecating the current method). This will add to the existing encodings of the double quote raw " as raw \" et c. but remove the ECMA compatible non-JSON encoding raw \-. The encoding of the forward slash raw / as raw \/ appears compatible with both ECMA and JSON. To sum up, these characters need replacing on output.

  • the double quote raw " with raw \" against escaping a javascript string surrounded by double quotes,
  • the single quote raw ' with raw \u0027 against escaping a javascript string surrounded by single quotes,
  • the forward slash raw / with raw \/ against constructing a closing script tag in a javascript string,
  • the opening angle bracket raw < against closing the script tag or opening a comment tag,
  • the closing angle bracket raw > against closing a possible CDATA wrapper around the javascript text embedded in XHTML (and HTML?),
  • U+2028 and U+2029 allowed in JSON but not in Javascript against disrupting the javascript engine.
  • the ampersand raw & against the mandatory HTML entity decoding in XHTML documents embedding scripts without a CDATA wrapper. (HTML documents do not apply the entity decoding to embedded scripts but the suggested string literal encoding does not change the string's value).

I understand that these layers look secondary to JSON or Javascript string literal encoding per se, but leaving the task of protecting the string literal against these layers would require re-implementing the encoder for each use case. Attempting to apply additional encoding with simple .replace() chaining will suffer from destroying the context of the backslash-protected characters. Luckily, the suggested "preliminary optimization" by using the raw \uHHHH encoding is both acceptable by the lower level interpreter (javascript engine) and defending against additional interpreters on top of it.

This seems like a good argument. Would you care to give us a PR? I'll also as Jeff to take a look at this.

PS: The earlier spam was accidental and I deleted it. I'm back to updating this project.

After speaking to Jeff on this, we have the following response and wish to close the issue and open a new feature request.

Summary: encodeForJavaScript is not meant to be safe for JSON, but we do plan to add a new JSON encode function, in the meantime consider https://github.com/yahoo/serialize-javascript type logic to safely embed JSON on a page.