escaping of reserved characters

Question

escaping of reserved characters

gdm85 opened this issue 9 years ago · comments

If you use '%' in any string field, the character will not be correctly escaped when passed via the json GET querystring parameter.

This is the diff I used to narrow down the bug:

diff --git a/myservice-letmegrpc/myservice.letmegrpc.go b/myservice-letmegrpc/myservice.letmegrpc.go
index 389156e..9f8b747 100644
--- a/myservice-letmegrpc/myservice.letmegrpc.go
+++ b/myservice-letmegrpc/myservice.letmegrpc.go
@@ -35,7 +35,10 @@ It has these top-level messages:
 */
 package myservice

-import net_http "net/http"
+import (
+       net_http "net/http"
+       "net/url"
+)
 import encoding_json "encoding/json"
 import io "io"
 import golang_org_x_net_context "golang.org/x/net/context"
@@ -2437,7 +2440,17 @@ s += '<div class="children" type="Something_something">' + buildSomething_som

 func (this *htmlmyservice) CreateSomething(w net_http.ResponseWriter, req *net_http.Request) {
        w.Write([]byte(Header(`myservice`, `CreateSomething`)))
-       jsonString := req.FormValue("json")
+
+       rawQuery := req.URL.RawQuery[len("json="):]
+       fmt.Println("raw query =", rawQuery)
+
+       //      jsonString := req.FormValue("json")
+       jsonString, err := url.QueryUnescape(rawQuery)
+       if err != nil {
+               w.Write([]byte("<div class=\"alert alert-danger\" role=\"alert\">" + err.Error() + "</div>"))
+               return
+       }
+
        someValue := false
        msg := &CreateSomethingRequest{}
        if len(jsonString) > 0 {
@@ -2450,6 +2463,8 @@ func (this *htmlmyservice) CreateSomething(w net_http.ResponseWriter, req
                        w.Write([]byte("<div class=\"alert alert-danger\" role=\"alert\">" + err.Error() + "</div>"))
                }
                someValue = true
+       } else {
+               w.Write([]byte("<div class=\"alert alert-info\" role=\"alert\">" + `No JSON sent` + "</div>"))
        }
        w.Write([]byte(Formmyservice_CreateSomething))
        if someValue {

Walter Schulze · Answer 1 · Thu Dec 17 2015 17:29:00 GMT+0800 (China Standard Time)

There is definitely a problem here.
I am investigating further about how to fix this.

Walter Schulze · Answer 2 · Thu Dec 17 2015 17:42:38 GMT+0800 (China Standard Time)

This should fix the problem
d9e70ab

gdm85 · Answer 3 · Fri Dec 18 2015 04:35:58 GMT+0800 (China Standard Time)

@awalterschulze yes I can see that will fix the issue, but wouldn't be best to simply perform URL encoding when the json= querystring is composed?

Walter Schulze · Answer 4 · Fri Dec 18 2015 16:10:39 GMT+0800 (China Standard Time)

Sorry I am really a noob web programmer.

Wouldn't it make the rest of the url less readable/hackable?

Walter Schulze · Answer 5 · Fri Dec 18 2015 16:11:57 GMT+0800 (China Standard Time)

I don't want this

{"bla":1}

To reslut in

%7B%22bla%22%3A1%7D

gdm85 · Answer 6 · Sat Dec 19 2015 03:16:24 GMT+0800 (China Standard Time)

@awalterschulze well, if you are missing other sequences that need escaping, it will simply break.

I suggest to use the correct JavaScript function encodeURIComponent and then if you want to have some characters encoded by the browser (that's what happens when you type them non-encoded, and when it doesn't happen it just breaks server-side as per OP) you can replace after the call to the proper encoding function. The list of reserved characters is here for example: https://en.wikipedia.org/wiki/Percent-encoding

Walter Schulze · Answer 7 · Sat Dec 19 2015 14:18:24 GMT+0800 (China Standard Time)

I am going on holiday, but I'll be back in about a week, then I'll look at this again.
Thank you for the help.

Walter Schulze · Answer 8 · Mon Jan 04 2016 18:03:34 GMT+0800 (China Standard Time)

Yes if there are other sequences that require escaping it will break :(

I am struggling to understand the second paragraph.

The wikipedia page says that encodeURIComponent is a non-standard function, or am I reading it wrong?
Maybe there is a better js function?

Reserved characters that don't cause problems

! $ ( ) * + , / : ? @ [ ]

Reserved characters that cause problems

# ; =

Reserved characters that cause problems that have been fixed

& '

Percent is also a problem, but has now been fixed

But I haven't taken into account character sequences.
Do you have any examples?

gdm85 · Answer 9 · Mon Feb 22 2016 18:26:09 GMT+0800 (China Standard Time)

@awalterschulze sorry, this issue went out of my radar.

The wikipedia page says

Non-standard implementations
There exists a non-standard encoding for Unicode characters: %uxxxx, where xxxx is a UTF-16 code unit represented as four hexadecimal digits. This behavior is not specified by any RFC and has been rejected by the W3C. The third edition of ECMA-262 still includes an escape function that uses this syntax, along with encodeURI and encodeURIComponent functions, which apply UTF-8 encoding to a string, then percent-escape the resulting bytes.

I think you're reading it wrong :) It's referring to unicode not specifically to the encodeURIComponent function which is standard since ECMA3, a summary is here: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent

What is non-standard is not encoding some characters, because browsers might vary in the way of handling it. But let's say that letmegrpc is only supposed to be used with a few browsers that do support our hack with some characters, then what I proposed before in that 2nd paragraph was:

use encodeURIComponent on all input from user
do a string replace for some of the characters you want instead the browser to take care of e.g. %24%20%2B%20%2C%20%2F%20%3A%20%3F%20%40%20%5B%20%5D%20%7B%20%7D (which is the encoding for $ + , / : ? @ [ ] { })

What I mean in (2) is that you have some decoding with a simple string replacement (it's safe to do on an encoded string). Example snippet:

function encodePartially(userInput) {
    var encoded = encodeURIComponent(userInput);

    return encoded.replace("%24", "$").replace("%7D", "}"); // add more here
}

I would still suggest to limit the replacement, leaving out of this 2nd-pass replacement ? for example (a known query-string delimiter)

Walter Schulze · Answer 10 · Tue Feb 23 2016 19:04:07 GMT+0800 (China Standard Time)

Ok so you want to use encodeURIComponent and then "decode" some of the encoded characters?
I just want to make sure I understand you clearly.
Instead of only encoding some of the characters.
This way their will be a better catch all, but we can still try and make the URL readable and editable by "decoding" most of the characters.
Is that what you meant?

gdm85 · Answer 11 · Wed Feb 24 2016 22:43:17 GMT+0800 (China Standard Time)

@awalterschulze yes that! :)

Walter Schulze · Answer 12 · Thu Feb 25 2016 14:57:48 GMT+0800 (China Standard Time)

Ok so characters that would be decoded include:

, } { <all ascii characters> "

Does that sound about right?