minio / simdjson-go

Golang port of simdjson: parsing gigabytes of JSON per second

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

overflowing uint64 being parsed as float64

karlmcguire opened this issue · comments

I'm using 3d975b7 as the last commit and here's how to reproduce the issue:

package main

import (
	"fmt"

	"github.com/minio/simdjson-go"
)

func main() {
	data := []byte(`{
		"number": 27670116110564327426
	}`)

	parsed, err := simdjson.Parse(data, nil)
	if err != nil {
		panic(err)
	}

	iter := parsed.Iter()
	iter.Advance()

	tmp := &simdjson.Iter{}
	obj := &simdjson.Object{}

	_, tmp, err = iter.Root(tmp)
	if err != nil {
		panic(err)
	}

	obj, err = tmp.Object(obj)
	if err != nil {
		panic(err)
	}

	// convert to map[string]interface{}
	m, err := obj.Map(nil)
	if err != nil {
		panic(err)
	}

	// prints: 2.7670116110564327e+19
	//
	// should overflow uint64
	fmt.Println(m["number"])
}

Interestingly, I found an article from 2018 where Dgraph was making the same mistake.

@karlmcguire That is by design. Numbers are untyped in JSON, so we find the closest working format.

If we rejected this input we would reject perfectly valid JSON, as your example, eg: https://play.golang.org/p/3x0KdffDwTY

If you expect certain fields to only have integers you must add that validation to your code.

I see. I guess what I need is dec.UseNumber() from encoding/json but for simdjson-go.

@karlmcguire

Technically the tape format could be modified to include a reference to the original ascii value in the input. That would be the only reasonable way I see to provide access to that data. But that is a rather large change.

@karlmcguire Thinking about it, we do have extra bits available on the tape for float values. We can't provide the original number, but we can provide a flag "this is a float converted from integer notation, because it overflowed".

@karlmcguire See #31 as a proposal.

That's exactly what I need. Thank you!