escaping of reserved characters
gdm85 opened this issue · comments
If you use '%' in any string field, the character will not be correctly escaped when passed via the json
GET querystring parameter.
This is the diff I used to narrow down the bug:
diff --git a/myservice-letmegrpc/myservice.letmegrpc.go b/myservice-letmegrpc/myservice.letmegrpc.go
index 389156e..9f8b747 100644
--- a/myservice-letmegrpc/myservice.letmegrpc.go
+++ b/myservice-letmegrpc/myservice.letmegrpc.go
@@ -35,7 +35,10 @@ It has these top-level messages:
*/
package myservice
-import net_http "net/http"
+import (
+ net_http "net/http"
+ "net/url"
+)
import encoding_json "encoding/json"
import io "io"
import golang_org_x_net_context "golang.org/x/net/context"
@@ -2437,7 +2440,17 @@ s += '<div class="children" type="Something_something">' + buildSomething_som
func (this *htmlmyservice) CreateSomething(w net_http.ResponseWriter, req *net_http.Request) {
w.Write([]byte(Header(`myservice`, `CreateSomething`)))
- jsonString := req.FormValue("json")
+
+ rawQuery := req.URL.RawQuery[len("json="):]
+ fmt.Println("raw query =", rawQuery)
+
+ // jsonString := req.FormValue("json")
+ jsonString, err := url.QueryUnescape(rawQuery)
+ if err != nil {
+ w.Write([]byte("<div class=\"alert alert-danger\" role=\"alert\">" + err.Error() + "</div>"))
+ return
+ }
+
someValue := false
msg := &CreateSomethingRequest{}
if len(jsonString) > 0 {
@@ -2450,6 +2463,8 @@ func (this *htmlmyservice) CreateSomething(w net_http.ResponseWriter, req
w.Write([]byte("<div class=\"alert alert-danger\" role=\"alert\">" + err.Error() + "</div>"))
}
someValue = true
+ } else {
+ w.Write([]byte("<div class=\"alert alert-info\" role=\"alert\">" + `No JSON sent` + "</div>"))
}
w.Write([]byte(Formmyservice_CreateSomething))
if someValue {
There is definitely a problem here.
I am investigating further about how to fix this.
This should fix the problem
d9e70ab
@awalterschulze yes I can see that will fix the issue, but wouldn't be best to simply perform URL encoding when the json=
querystring is composed?
Sorry I am really a noob web programmer.
Wouldn't it make the rest of the url less readable/hackable?
I don't want this
{"bla":1}
To reslut in
%7B%22bla%22%3A1%7D
@awalterschulze well, if you are missing other sequences that need escaping, it will simply break.
I suggest to use the correct JavaScript function encodeURIComponent and then if you want to have some characters encoded by the browser (that's what happens when you type them non-encoded, and when it doesn't happen it just breaks server-side as per OP) you can replace after the call to the proper encoding function. The list of reserved characters is here for example: https://en.wikipedia.org/wiki/Percent-encoding
I am going on holiday, but I'll be back in about a week, then I'll look at this again.
Thank you for the help.
Yes if there are other sequences that require escaping it will break :(
I am struggling to understand the second paragraph.
The wikipedia page says that encodeURIComponent is a non-standard function, or am I reading it wrong?
Maybe there is a better js function?
Reserved characters that don't cause problems
! $ ( ) * + , / : ? @ [ ]
Reserved characters that cause problems
# ; =
Reserved characters that cause problems that have been fixed
& '
Percent is also a problem, but has now been fixed
%
But I haven't taken into account character sequences.
Do you have any examples?
@awalterschulze sorry, this issue went out of my radar.
The wikipedia page says
Non-standard implementations
There exists a non-standard encoding for Unicode characters: %uxxxx, where xxxx is a UTF-16 code unit represented as four hexadecimal digits. This behavior is not specified by any RFC and has been rejected by the W3C. The third edition of ECMA-262 still includes an escape function that uses this syntax, along with encodeURI and encodeURIComponent functions, which apply UTF-8 encoding to a string, then percent-escape the resulting bytes.
I think you're reading it wrong :) It's referring to unicode not specifically to the encodeURIComponent function which is standard since ECMA3, a summary is here: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent
What is non-standard is not encoding some characters, because browsers might vary in the way of handling it. But let's say that letmegrpc is only supposed to be used with a few browsers that do support our hack with some characters, then what I proposed before in that 2nd paragraph was:
- use encodeURIComponent on all input from user
- do a string replace for some of the characters you want instead the browser to take care of e.g.
%24%20%2B%20%2C%20%2F%20%3A%20%3F%20%40%20%5B%20%5D%20%7B%20%7D
(which is the encoding for$ + , / : ? @ [ ] { }
)
What I mean in (2) is that you have some decoding with a simple string replacement (it's safe to do on an encoded string). Example snippet:
function encodePartially(userInput) {
var encoded = encodeURIComponent(userInput);
return encoded.replace("%24", "$").replace("%7D", "}"); // add more here
}
I would still suggest to limit the replacement, leaving out of this 2nd-pass replacement ?
for example (a known query-string delimiter)
Ok so you want to use encodeURIComponent and then "decode" some of the encoded characters?
I just want to make sure I understand you clearly.
Instead of only encoding some of the characters.
This way their will be a better catch all, but we can still try and make the URL readable and editable by "decoding" most of the characters.
Is that what you meant?
@awalterschulze yes that! :)
Ok so characters that would be decoded include:
, } { <all ascii characters> "
Does that sound about right?