goccy / sonic

A blazingly fast JSON serializing & deserializing library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sonic

A blazingly fast JSON serializing & deserializing library, accelerated by JIT (just-in-time compiling) and SIMD (single-instruction-multiple-data).

Requirement

  • Go 1.15/1.16/1.17
  • Linux/darwin OS
  • Amd64 CPU with AVX instruction set

Features

  • Runtime object binding without code generation
  • Complete APIs for JSON value manipulation
  • Fast, fast, fast!

Benchmarks

For all sizes of json and all scenarios of usage, Sonic performs best.

  • Medium (13KB, 300+ key, 6 layers)
goversion: 1.17.1
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkEncoder_Generic_Sonic                        25181 ns/op         517.65 MB/s       13035 B/op          4 allocs/op
BenchmarkEncoder_Generic_JsonIter                     43765 ns/op         297.84 MB/s       13433 B/op         77 allocs/op
BenchmarkEncoder_Generic_StdLib                      108776 ns/op         119.83 MB/s       49137 B/op        827 allocs/op
BenchmarkEncoder_Binding_Sonic                         6282 ns/op        2075.01 MB/s       13765 B/op          4 allocs/op
BenchmarkEncoder_Binding_JsonIter                     20740 ns/op         628.51 MB/s        9487 B/op          2 allocs/op
BenchmarkEncoder_Binding_StdLib                       16661 ns/op         782.34 MB/s        9479 B/op          1 allocs/op
BenchmarkEncoder_Parallel_Generic_Sonic-16             4072 ns/op        3200.89 MB/s       11052 B/op          4 allocs/op
BenchmarkEncoder_Parallel_Generic_JsonIter-16         11379 ns/op        1145.52 MB/s       13458 B/op         77 allocs/op
BenchmarkEncoder_Parallel_Generic_StdLib-16           50635 ns/op         257.43 MB/s       49183 B/op        827 allocs/op
BenchmarkEncoder_Parallel_Binding_Sonic-16             1304 ns/op        9994.64 MB/s       10925 B/op          4 allocs/op
BenchmarkEncoder_Parallel_Binding_JsonIter-16          6072 ns/op        2146.76 MB/s        9505 B/op          2 allocs/op
BenchmarkEncoder_Parallel_Binding_StdLib-16            3510 ns/op        3713.89 MB/s        9481 B/op          1 allocs/op

BenchmarkDecoder_Generic_Sonic                        53843 ns/op         242.09 MB/s       49779 B/op        313 allocs/op
BenchmarkDecoder_Generic_StdLib                      130402 ns/op          99.96 MB/s       50868 B/op        772 allocs/op
BenchmarkDecoder_Generic_JsonIter                     92810 ns/op         140.45 MB/s       55788 B/op       1068 allocs/op
BenchmarkDecoder_Binding_Sonic                        29793 ns/op         437.52 MB/s       24778 B/op         34 allocs/op
BenchmarkDecoder_Binding_StdLib                      121206 ns/op         107.54 MB/s       10576 B/op        208 allocs/op
BenchmarkDecoder_Binding_JsonIter                     36099 ns/op         361.09 MB/s       14674 B/op        385 allocs/op
BenchmarkDecoder_Parallel_Generic_Sonic-16            10319 ns/op        1263.21 MB/s       49423 B/op        313 allocs/op
BenchmarkDecoder_Parallel_Generic_StdLib-16           58526 ns/op         222.72 MB/s       50875 B/op        772 allocs/op
BenchmarkDecoder_Parallel_Generic_JsonIter-16         60156 ns/op         216.69 MB/s       55812 B/op       1068 allocs/op
BenchmarkDecoder_Parallel_Binding_Sonic-16             7265 ns/op        1794.18 MB/s       24952 B/op         34 allocs/op
BenchmarkDecoder_Parallel_Binding_StdLib-16           44000 ns/op         296.25 MB/s       10575 B/op        208 allocs/op
BenchmarkDecoder_Parallel_Binding_JsonIter-16         21029 ns/op         619.86 MB/s       14678 B/op        385 allocs/op

BenchmarkGetOne_Sonic                                 17070 ns/op         762.94 MB/s          29 B/op          1 allocs/op
BenchmarkGetOne_Gjson                                 19714 ns/op         660.59 MB/s           0 B/op          0 allocs/op
BenchmarkGetOne_Jsoniter                              99281 ns/op         131.17 MB/s       27936 B/op        647 allocs/op
BenchmarkSetOne_Sonic                                 23730 ns/op         548.80 MB/s        1883 B/op         17 allocs/op
BenchmarkSetOne_Sjson                                 57680 ns/op         225.78 MB/s       52180 B/op          9 allocs/op
BenchmarkSetOne_Jsoniter                             104018 ns/op         125.20 MB/s       45859 B/op        964 allocs/op
BenchmarkGetOne_Parallel_Sonic-16                      2010 ns/op        6479.41 MB/s         114 B/op          1 allocs/op
BenchmarkGetOne_Parallel_Gjson-16                      1815 ns/op        7176.39 MB/s           0 B/op          0 allocs/op
BenchmarkGetOne_Parallel_Jsoniter-16                  23261 ns/op         559.86 MB/s       27942 B/op        647 allocs/op
BenchmarkSetOne_Parallel_Sonic-16                      2007 ns/op        6487.78 MB/s        2202 B/op         17 allocs/op
BenchmarkSetOne_Parallel_Sjson-16                     12422 ns/op        1048.40 MB/s       52180 B/op          9 allocs/op
BenchmarkSetOne_Parallel_Jsoniter-16                  39204 ns/op         332.18 MB/s       45889 B/op        964 allocs/op
  • Small (400B, 11 keys, 3 layers) small benchmarks
  • Large (635KB, 10000+ key, 6 layers) large benchmarks

See bench.sh for benchmark codes.

How it works

See INTRODUCTION.md.

Usage

Marshal/Unmarshal

Their behaviors are mostly consistent with encoding/json, except two escaping form (see issue4) that is NOT in conformity to RFC8259.

import "github.com/bytedance/sonic"

var data YourSchema
// Marshal
output, err := sonic.Marshal(&data) 
// Unmarshal
err := sonic.Unmarshal(output, &data) 

Use Number/Use Int64

import "github.com/bytedance/sonic/decoder"

var input = `1`
var data interface{}

// default float64
dc := decoder.NewDecoder(input) 
dc.Decode(&data) // data == float64(1)
// use json.Number
dc = decoder.NewDecoder(input)
dc.UseNumber()
dc.Decode(&data) // data == json.Number("1")
// use int64
dc = decoder.NewDecoder(input)
dc.UseInt64()
dc.Decode(&data) // data == int64(1)

root, err := sonic.GetFromString(input)
// Get json.Number
jn := root.Number()
jm := root.InterfaceUseNumber().(json.Number) // jn == jm
// Get float64
fn := root.Float64()
fm := root.Interface().(float64) // jn == jm

Sort Keys

On account of the performance loss from sorting (roughly 10%), sonic doesn't enable this feature by default. If your component depends on it to work (like zstd), Use it like this:

import "github.com/bytedance/sonic/encoder"

m := map[string]interface{}{}
v, err := encoder.Encode(m, encoder.SortMapKeys)

Print Syntax Error

import "github.com/bytedance/sonic/decoder"

var data interface{}
dc := decoder.NewDecoder("[[[}]]")
if err := dc.Decode(&data); err != nil {
    if e, ok := err.(decoder.SyntaxError); ok {
        
        /*Syntax error at index 3: invalid char

            [[[}]]
            ...^..
        */
        print(e.Description())

        /*"Syntax error at index 3: invalid char\n\n\t[[[}]]\n\t...^..\n"*/
        println(fmt.Sprintf("%q", e.Description()))
    }

    /*Decode: Syntax error at index 3: invalid char*/
    t.Fatalf("Decode: %v", err) 
}

Ast.Node

Sonic/ast.Node is a completely self-contained AST for JSON. It implements serialization and deserialization both, and provides robust APIs for obtaining and modification of generic data.

Get/Index

Search partial JSON by given paths, which must be non-negative integer or string or nil

import "github.com/bytedance/sonic"

input := []byte(`{"key1":[{},{"key2":{"key3":[1,2,3]}}]}`)

// no path, returns entire json
root, err := sonic.Get(input)
raw := root.Raw() // == string(input)

// multiple pathes
root, err := sonic.Get(input, "key1", 1, "key2")
sub := root.Get("key3").Index(2).Int64() // == 3

Tip: since Index() uses offset to locate data, which is faster much than scanning like Get(), we suggest you use it as much as possible. And sonic also provides another API IndexOrGet() to underlying use offset as well as ensuring the key is matched.

Set/Unset

Modify the json content by Set()/Unset()

import "github.com/bytedance/sonic"

// Set
exist, err := root.Set("key4", NewBool(true)) // exist == false
alias1 := root.Get("key4") 
println(alias1.Valid()) // true
alias2 := root.Index(1)
println(alias1 == alias2) // true

// Unset
exist, err := root.UnsetByIndex(1) // exist == true
println(root.Get("key4").Check()) // "value not exist"

Serialize

To encode ast.Node as json, use MarshalJson() or json.Marshal() (MUST pass the node's pointer)

import (
    "encoding/json"
    "github.com/bytedance/sonic"
)

buf, err := root.MarshalJson()
println(string(buf))                // {"key1":[{},{"key2":{"key3":[1,2,3]}}]}
exp, err := json.Marshal(&root)     // WARN: use pointer
println(string(buf) == string(exp)) // true

APIs

  • validation: Check(), Error(), Valid(), Exist()
  • searching: Index(), Get(), IndexPair(), IndexOrGet(), GetByPath()
  • go-type casting: Int64(), Float64(), String(), Number(), Bool(), Map[UseNumber|UseNode](), Array[UseNumber|UseNode](), Interface[UseNumber|UseNode]()
  • go-type packing: NewRaw(), NewNumber(), NewNull(), NewBool(), NewString(), NewObject(), NewArray()
  • iteration: Values(), Properties()
  • modification: Set(), SetByIndex(), Add(), Cap(), Len()

Tips

Pretouch

Since Sonic uses golang-asm as a JIT assembler, which is NOT very suitable for runtime compiling, first-hit running of a huge schema may cause request-timeout or even process-OOM. For better stability, we advise to use Pretouch() for huge-schema or compact-memory application before Marshal()/Unmarshal().

import (
    "reflect"
    "github.com/bytedance/sonic"
    "github.com/bytedance/sonic/option"
 )
 
 func init() {
     var v HugeStruct
    // For most large types (nesting depth <= 5)
     err := sonic.Pretouch(reflect.TypeOf(v))
    // If the type is too deep nesting (nesting depth > 5),
    // you can set compile recursive depth in Pretouch for better stability in JIT.
    err := sonic.Pretouch(reflect.TypeOf(v), option.WithCompileRecursiveDepth(depth))

Pass string or []byte?

For alignment to encoding/json, we provide API to pass []byte as an argument, but the string-to-bytes copy is conducted at the same time considering safety, which may lose performance when origin JSON is huge. Therefore, you can use UnmarshalString and GetFromString to pass a string, as long as your origin data is a string or nocopy-cast is safe for your []byte.

Better performance for generic data

In fully-parsed scenario, Unmarshal() performs better than Get()+Node.Interface(). But if you only have a part of schema for specific json, you can combine Get() and Unmarshal() together:

import "github.com/bytedance/sonic"

node, err := sonic.GetFromString(_TwitterJson, "statuses", 3, "user")
var user User // your partial schema...
err = sonic.UnmarshalString(node.Raw(), &user)

Even if you don't have any schema, use ast.Node as the container of generic values instead of map or interface:

import "github.com/bytedance/sonic"

root, err := sonic.GetFromString(_TwitterJson)
user := root.GetByPath("statuses", 3, "user")  // === root.Get("status").Index(3).Get("user")
err = user.Check()

// err = user.LoadAll() // only call this when you want to use 'user' concurrently...
go someFunc(user)

Why? Because ast.Node stores its children using array:

  • Map's performance degrades a lot once rehashing triggered, but ast.Node doesn't have this concern;
  • Hashing (map[x]) is not as efficient as Indexing (array[x]), which ast.Node can conduct on both array and object.
  • Using Interface()/Map() means Sonic must parse all the underlying values, while in most cases you don't need them all;

CAUTION: ast.Node DOESN'T ensure concurrent security directly, due to its lazy-load design. However, your can call Node.Load()/Node.LoadAll() to achieve that, which may bring performance reduction while it still works faster than converting to map or interface{}

About

A blazingly fast JSON serializing & deserializing library

License:Apache License 2.0


Languages

Language:Assembly 55.9%Language:Go 37.8%Language:C 6.1%Language:Shell 0.1%Language:Makefile 0.1%