apache / couchdb-fauxton

Fauxton is the new Web UI for CouchDB

Home Page:https://github.com/apache/couchdb-fauxton

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data corruption of numeric fields

redwheelbarrow opened this issue · comments

Fauxton displays the 64 bit numbers from couchdb incorrectly. Saving the document results in the number being changed to the incorrect/displayed value.

Expected Behavior

The displayed number should be what is in the database

Current Behavior

When saving documents with larger 64-bit numbers, the displayed value when viewing the document in Fauxton is usually rounded to the next 1000. Querying couchdb shows the original number. Saving the document in Fauxton results in changing the number in couchdb to the incorrectly displayed number.

Possible Solution

Not familiar with code base

Steps to Reproduce (for bugs)

  1. Create a document with the maximum signed 64-bit value (9_223_372_036_854_775_807)
  2. View the document in Fauxton - observe the value is displayed as 9_223_372_036_854_776_000
  3. Curl the same document and observe the value is correct
  4. Save the document in Fauxton
  5. Curl the document again to observe the value has been changed to 9_223_372_036_854_776_000

Context

  1. It makes it impossible to use Fauxton for resolving customer issues for risk of altering data unintentionally

Your Environment

  • Version used: 3.3.3
  • Browser Name and version: Both Edge and Brave
  • Operating System and version (desktop or mobile): Windows 11 Home

Not sure this is something that can be addressed but l will drop some notes from my research on this topic.

First of all, there's no 64-bit signed integer in JSON. The JSON RFC doesn't specify a limit on the number precision but it sort of implies implementations should use 64-bit float for interoperability. That would make the range of a valid integer to be from -(2^53)+1 to (2^53)-1].

9_223_372_036_854_775_807 (max signed int64) 
9_007_199_254_740_991 (2**53-1)

2**53-1 also matches Number.MAX_SAFE_INTEGER from Javascript so one can argue that Fauxton supports the "safe" range for JSON numbers as per the RFC.

I can't speak for the CouchDB implementation but it doesn't seem to impose a limit on number values as I was able to add the ridiculously long value below. That means it's treating the JSON as a sequence of characters which is exactly what it is,
without converting to any specific machine representation for numbers.

{"_id":"doc9","_rev":"1-25ce58eaae037addc94b1fcb06385f5e","n":92233720368547769995425325439238473892174983721894738921478392714893721892}

But there's a catch, if you try to use a map/reduce view, you're faced with the same loss of precision since it goes through the JS engine. For instance, using the the view below

function (doc) {
  if (doc.n) {
    emit(doc._id, doc.n);
  }
}

I get these values:

{"total_rows":5,"offset":0,"rows":[
{"id":"doc2","key":"doc2","value":2132143},
{"id":"doc6","key":"doc6","value":9223372036854776000},
{"id":"doc7","key":"doc7","value":9223372036854778000},
{"id":"doc8","key":"doc8","value":9007199254740991},
{"id":"doc9","key":"doc9","value":9.223372036854777e+73}
]}

where
     {"_id":"doc6","_rev":"1-6292f628aa9e691f51018f0cf1953e37","n":9223372036854776807}
     {"_id":"doc9","_rev":"1-25ce58eaae037addc94b1fcb06385f5e","n":92233720368547769995425325439238473892174983721894738921478392714893721892}
     

So my take is that there's an implicit limit to what numbers you can store in CouchDB so you don't run into loss of precision, and that limit matches what Fauxton supports.


All that said, the potential for data loss is still there. In theory, Fauxton could be updated to treat JSON as string only and never parse the value as a object, but the same would have to be true for any JS dependencies in use. All in all, it's a hard ask, and I'll leave it at that for now.

Agree with @Antonio-Maranhao. Any json numbers passed through a JS environment, browser or indexing JS engine (Spidermonkey currently), will have these issues.

To maintain precision at those sizes, try storing numbers as strings.

Agreeing with both, this is a consequence of the number passing through a Javascript engine, rather than anything inherent to couchdb itself. You can store an integer field in a couchdb document that goes well beyond that (as erlang supports multi-precision ints), but avoiding all JS paths is a bit trickier. a json index would avoid it, as would a built-in reduce.

Also agreeing that if you want numeric precision beyond 64-bit floating point within Javascript you'll need to store them as something other than a JSON number (strings, say), and then use a math library to manipulate them (noting that math.js has a serialization format that's a JSON object with string values).

That means it's treating the JSON as a sequence of characters which is exactly what it is,
without converting to any specific machine representation for numbers.

This isn't true. Its just that Jiffy (CouchDB's JSON parser) is capable of handling bignums easily since Erlang has bignum support built in.

This works by detecting when it successfully parses a number (i.e., we found a string that matches JSON grammar) but isn't capable of storing the value in a native type. When that happens, we set a flag and wrap the sub-binary with a tagged tuple that is then processed in Erlang.

The logic in C can be found here:

https://github.com/davisp/jiffy/blob/9ea1b35b6e60ba21dfd4adbd18e7916a831fd7d4/c_src/decoder.c#L588-L617

And then when the parsed JSON value is passed back to Erlang we run this function over it to get real bignums:

https://github.com/davisp/jiffy/blob/master/src/jiffy.erl#L111-L142

That said, if you're working with large numbers you'll want to either ensure that your JSON parser has bignum support, or follow @nickva's advice and store your numbers as strings to be interpreted at the application level.

So in summary:

  1. A 64 bit integer is stored in couchdb correctly - and it can handle larger as well
  2. The javascript engine(s) used by fauxton and couchdb OOTB seem to use 64 bit floats for large or all numbers, thus causing the modification.
  3. Just deal with it at the app level using strings

Gonna go ahead and close this, thank you

In fauxton, the javascript engine is your browser. for couchdb indexes (map/reduce, search, mango) or other endpoints (like _update or validate_doc_update) it will be spidermonkey (the JS engine that firefox uses).

swapping out the js engines won't help (and not possible within your browser afaik), but we never suggested that. Instead store your larger-than-natively-supported numbers in some format that javascript won't break (like strings) and use other javascript to manipulate them (e.g https://mathjs.org/).