为什么SetJSON会把入参里的&替换成\u0026 ？

Question

为什么SetJSON会把入参里的&替换成\u0026 ？

zohu opened this issue 2 years ago · comments

在SetJSON前检查还是https://uat.xxx.com?a=1&b=1
打断点看传给SetJSON也是&，可是gout实际发出去的却是\u0026 ？

2022-04-20 09:51:01.018 INFO    log/writer.go:21        > POST /cgi-bin/menu/create?access_token=56_zk9P0-rpBoNUhlneUAVJfABALGU HTTP/1.1
2022-04-20 09:51:01.018 INFO    log/writer.go:21        > Content-Type: application/json
2022-04-20 09:51:01.018 INFO    log/writer.go:21        >
2022-04-20 09:51:01.018 INFO    log/writer.go:21        
2022-04-20 09:51:01.018 INFO    log/writer.go:21        {"button":[{"type":"view","name":"今日歌曲","url":"https://uat.xxx.com?a=1\u0026b=1"}]}
2022-04-20 09:51:01.018 INFO    log/writer.go:21        
2022-04-20 09:51:01.018 INFO    log/writer.go:21        < HTTP/1.1 200 OK
2022-04-20 09:51:01.018 INFO    log/writer.go:21        < Connection: keep-alive
2022-04-20 09:51:01.018 INFO    log/writer.go:21        < Content-Type: application/json; encoding=utf-8
2022-04-20 09:51:01.018 INFO    log/writer.go:21        < Date: Wed, 20 Apr 2022 01:51:01 GMT
2022-04-20 09:51:01.018 INFO    log/writer.go:21        < Content-Length: 141
2022-04-20 09:51:01.018 INFO    log/writer.go:21        
2022-04-20 09:51:01.018 INFO    log/writer.go:21        
2022-04-20 09:51:01.018 INFO    log/writer.go:21        {"errcode":40033,"errmsg":"invalid charset. please check your request, if include \\uxxxx will create fail! rid: 625f6705-437db838-0f7f9178"}

ZoHo commented 2 years ago

#332

ZoHo · Answer 1 · Wed Apr 20 2022 10:35:18 GMT+0800 (China Standard Time)

json.marshal默认escapeHtml为true，会将<、>、&等字符转义，提了个PR。

guonaihong · Answer 2 · Wed Apr 20 2022 20:26:36 GMT+0800 (China Standard Time)

感谢, pr. 可以使用标准库的, https://pkg.go.dev/encoding/json#Encoder.SetEscapeHTML 这个方法实现这个功能.

SetEscapeHTML specifies whether problematic HTML characters should be escaped inside JSON quoted strings. The default behavior is to escape &, <, and > to \u0026, \u003c, and \u003e to avoid certain safety problems that can arise when embedding JSON in HTML.

我可能希望. SetJSON的默认语义不变, 新加参数或者接口不转义HTML相关的字符编码.

ZoHo · Answer 3 · Thu Apr 21 2022 10:19:12 GMT+0800 (China Standard Time)

感谢回复。因为前面有一句不用encoder，如果用encoder那就可以传escapeHtml，按照目前代码的说法，这样就要处理\n，处理\n更容易误伤吧，我看json.marshal的源码有没有escapeHtml的操作也是直接替换的；

我可能觉得请求包就应该要数据前后一致、所见即所得，用替换的方式也不影响原数据本身就有的\u0026等符号，所以“新加参数或者接口不转义HTML相关的字符编码”感觉意义不大呢而且还要处理\n。

您看看怎么取舍吧，辛苦啦

guonaihong · Answer 4 · Thu Apr 21 2022 13:00:22 GMT+0800 (China Standard Time)

我先聊聊, 最小化改造的想法

我看json.marshal的源码有没有escapeHtml的操作也是直接替换的；

指的是HTMLEscape这个函数吗? 我刚刚也喵了下标准库源代码.
json.HTMLEscape主要修改了4处地方, <, >, & 最后就是U+2028到U+2029之间范围的字符

  func HTMLEscape(dst *bytes.Buffer, src []byte) {
    // The characters can only appear in string literals,
    // so just scan the string one byte at a time.
    start := 0
    for i, c := range src {
      if c == '<' || c == '>' || c == '&' {
        if start < i {
          dst.Write(src[start:i])
        }
        dst.WriteString(`\u00`)
        dst.WriteByte(hex[c>>4])
        dst.WriteByte(hex[c&0xF])
        start = i + 1
      }
      // Convert U+2028 and U+2029 (E2 80 A8 and E2 80 A9).
      if c == 0xE2 && i+2 < len(src) && src[i+1] == 0x80 && src[i+2]&^1 == 0xA8 {
        if start < i {
          dst.Write(src[start:i])
        }
        dst.WriteString(`\u202`)
        dst.WriteByte(hex[src[i+2]&0xF])
        start = i + 3
      }
    }
    if start < len(src) {
      dst.Write(src[start:])
    }
  }

我可能觉得请求包就应该要数据前后一致、所见即所得

这道理说得通. 那是否要把HTMLEscape函数逆向编码下(待定)?
逆向编码的方式还挺麻烦的. 我最后想了下, 还是Encode的方式简单点
你看Encode实现, 这个函数没有继续使用是因为标准库自我聪明加了e.WriteByte('\n')这行.

func (enc *Encoder) Encode(v any) error {     ■ undeclared name: any
    if enc.err != nil {
      return enc.err
    }
    e := newEncodeState()
    err := e.marshal(v, encOpts{escapeHTML: enc.escapeHTML})
    if err != nil {
      return err
    }

    // Terminate each value with a newline.
    // This makes the output look a little nicer
    // when debugging, and some kind of space
    // is required if the encoded value was a number,
    // so that the reader knows there aren't more
    // digits coming.
    e.WriteByte('\n')

    b := e.Bytes()
    if enc.indentPrefix != "" || enc.indentValue != "" {
      if enc.indentBuf == nil {
        enc.indentBuf = new(bytes.Buffer)
      }
      enc.indentBuf.Reset()
      err = Indent(enc.indentBuf, b, enc.indentPrefix, enc.indentValue)
      if err != nil {
        return err
      }
      b = enc.indentBuf.Bytes()
    }
    if _, err = enc.w.Write(b); err != nil {
      enc.err = err
    }
    encodeStatePool.Put(e)
    return err
  }

既然知道他肯定多一个'\n', 那就可以去掉'\n'
写了一个最小化的原型代码, 可以做这件事

 package main

  import (
    "bytes"
    "encoding/json"
    "fmt"
  )

  var bytesLine = []byte("\n")

  func main() {
    var b bytes.Buffer
    en := json.NewEncoder(&b)
    en.SetEscapeHTML(false)

    en.Encode(map[string]interface{}{
      "aa": "aa",
      "bb": "bb",
      "cc": "<>",
    })

    fmt.Printf("old.(%s)\n", b.String())
    jsonBytes := b.Bytes()
    if bytes.HasSuffix(jsonBytes, bytesLine) {
      jsonBytes = jsonBytes[:len(jsonBytes)-1]
    }

    fmt.Printf("new.(%s)\n", jsonBytes)
  }

ZoHo · Answer 5 · Thu Apr 21 2022 14:27:05 GMT+0800 (China Standard Time)

还是用encode好一些。
我被apifox误导了，它给每一行都加\n，看了Encode源码的确是只给末位加了，这样的话逆向应该没有问题。

guonaihong · Answer 6 · Sun Dec 18 2022 22:49:43 GMT+0800 (China Standard Time)

新加SetJSONNotEscape 接口，和SetJSON类似，不转义HTML特殊符号。