Edge cases and defaults
knapply opened this issue · comments
What are the edge cases {RcppSimdJson}
needs to handle?
What should be the defaults (internal and user-level)?
-
{}
Current candidates:[ ]named list()
:structure(list(), .Names = character(0))
same asjsonlite::fromJSON()
[ ]NULL
same asjsonify::from_json()
- Resolved by 94e571a (user passes any R object to replace empty objects with
NULL
as the defaut).
-
[]
Current candidates:[ ]vector()
:logical(length = 0L)
[ ]vector(mode = "list")
:list()
same asjsonify::from_json()
andjsonlite::fromJSON()
- Resolved by 94e571a (user passes any R object to replace empty arrays with
NULL
as the defaut).
-
null
- like
{}
and[]
, allow user to pass any R object to replace singlenull
, withNULL
as the default.
- like
@dcooley Moving the conversation from #17 over here:
[]
:vector()
{}
:structure(list(), .Names = character(0))
Here's the rationale for going about {}
and []
this way:
{}
If we say the R representation of {"a":1}
is list(a = 1L)
, we're really saying that its R representation is structure(list(1L), .Names = "a")
.
If we follow that analogy, empty object {}
is structure(list(), .Names = character(0))
, which print()
s as named list()
.
JSON | R Representation | equivalent/print() |
---|---|---|
{"a":1} |
structure(list(1L), .Names = "a") |
list(a = 1L) |
{} |
structure(list(), .Names = character()) |
named list() |
[]
And similarly you're returing [] as NA_LOGICAL?
See above: []
returns Rcpp::LogicalVector(0)
(vector of length 0):
RcppSimdJson:::.deserialize_json('[]')
#> logical(0)
This is really a matter of deciding which of the following represents []
:
vector()
vector(mode = "list")
.
But it does raise a few other questions:
- Are empty arrays recursive?
- Are empty arrays homogeneous?
- Are empty arrays "just arrays"... or are they really a special case?
{jsonify}
and {jsonlite}
use this pattern for simplifcation:
JSON | R | str() |
---|---|---|
["a",3.14,1,true,null] |
c("a","3.14","1","TRUE",NA) |
chr [1:5] "a" "3.14" "1" "TRUE" NA |
[3.14,1,true,null] |
c(3.14, 1, 1, NA) |
num [1:4] 3.14 1 1 NA |
[1,true,null]' |
c(1L, 1L, NA) |
int [1:3] 1 1 NA |
[true,null]' |
c(TRUE, NA) |
logi [1:2] TRUE NA |
[null] |
NA |
logi NA |
[] |
list() |
list() |
RcppSimdJson:::.deserialize_json()
's current default uses rcppsimdjson::Type_Policy::anything_goes
, which follows nearly the same pattern of collapsing arrays of scalars into R vectors, no matter what.
Here's the same table with the proposed default {RcppSimdJson}
pattern for simplification in bold:
JSON | R | str() |
---|---|---|
["a",3.14,1,true,null] |
c("a","3.14","1","TRUE",NA) |
chr [1:5] "a" "3.14" "1" "TRUE" NA |
[3.14,1,true,null] |
c(3.14, 1, 1, NA) |
num [1:4] 3.14 1 1 NA |
[1,true,null]' |
c(1L, 1L, NA) |
int [1:3] 1 1 NA |
[true,null]' |
c(TRUE, NA) |
logi [1:2] TRUE NA |
[null] |
NA |
logi NA |
[] |
logical(0) |
logi(0) |
[] |
list() |
list() |
The idea behind the proposed {RcppSimdJson}
approach is to just consistently following the same pattern.
I'd say the difference is that it treats []
as non-recursive and homogeneous (thus becoming a vector) instead of a special case like {jsonify}
and {jsonlite}
seem to do (potentially recursive and/or homogeneous?).
🤷♂️ Are there parsers / linters / specs / best practices?
I like how set user-drive policy options otherwise. That is simply the most flexible.
@lemire Thoughts?
It seems @dcooley and I don't quite agree with some of the default choices {jsonlite}
makes. There are compromises and edge cases no matter how you do it, so a 100% consensus isn't likely.
Jeroen Ooms' The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects is a wonderful deep dive into the a lot of the gory, R-specific details.
@knapply your logic all makes complete sense. I also tried to stay consistent with jsonlite
, but I think I deviated away where round-trips didn't work because I wanted a consistent way to get to and from the same object. (I'm not saying I was right, but it's what I did).
toJSON( NULL )
# {}
fromJSON('{}')
# named list()
to_json( NULL )
# {}
from_json('{}')
NULL
So in summary, I don't have any issues with the defaults chosen for RcppSimdJson
, but I think the round-trip is something to keep in mind if/when we can write the to json
side of it.
I'll try and sort the user-passable options ASAP so we at least have maximum flexibility.
I think there are good arguments both ways to be honest.
I have a deployed shiny app that uses {jsonify}
to serialize some crazy list columns so users can download spreadsheets, make modifications, and re-upload the changes. We're talking 100k+ row monstrosities that {jsonify}
handles with ease.
The tinkering I had to do was all in excel-related packages to get round trips to pass identical()
testing, not at all in {jsonify}
. I don't think there's a perfect answer for what "right" looks here, but you can't really do better than that!
I swapped out the JSON parsing routine for an API client with .deserialize_json()
for another real-world use case and realized that allowing the user to also replace single null
s with any R object would eliminate a whole step in cleaning up JSON responses from APIs that spam null
s in every object.
I probably have some variation of %||%
defined in any code that touches a web API, so I don't know why I didn't think of this earlier.
- This would
- eliminate that the need for the user to do that at the R level
- removing a commonly needed step in constructing data frames manually
- provide a way to match
jsonify::from_json(simplify = FALSE)
's behavior - be totally opt-in, as the default can just be set to
NULL
- eliminate that the need for the user to do that at the R level
So it's on the TODO list.
Anything actionable here now? I may have lost track of the main storyline...
The last open checkbox/comment before yours #18 (comment) (unless someone makes a case why it's a bad idea) is the only thing remaining.
It's a matter of adding a single_null = R_NilValue
parameter (single_null=
is just the first thing that comes to mind) to .deserialize_json()
and .load_json()
and all the rcppsimdjson::deserialize::simplify_*()
functions and anything calling them.
Ultimately, the returned value is here:
rcppsimdjson/inst/include/RcppSimdJson/deserialize/simplify.hpp
Lines 211 to 212 in d1b22f3
so that would just need to change from R_NilValue
to single_null
.
It's the same pattern as empty_array=
and empty_object=
.
I left the issue open as a reminder to myself, but it can be closed.
Is what we have "good enough" to be added to master to form 0.0.6?
Yes.
Ok, no pending merge or pr. I look over what we have and roll it up this eve or tomorrow.