Accept json=raw() directly to skip rawToChar step on JSON received from curl
MichaelChirico opened this issue · comments
As identified in this Twitter thread:
https://twitter.com/michael_chirico/status/1280656819606548480
See this Gist:
https://gist.github.com/MichaelChirico/f5e09ab9f5f437bb0286e8a42941a3e1
The performance of fparse
is already damn impressive, but let's see if we can't do a mite better 😎
JSON as raw can be retrieved like so:
gist = file.path(
'https://gist.githubusercontent.com/MichaelChirico',
'f5e09ab9f5f437bb0286e8a42941a3e1', 'raw',
'ab5f767b54810b53b30841ffe7f614aa07a32be0', 'presto_json_return.R'
)
charToRaw(tail(readLines(gist), 1L))
IINM from C++ POV this raw
vector should just be a subset of a character
vector...
The potential kibosh for this would be encoding issues, though rawToChar
also would not work for that case, so users with encoding issues can do iconv
themselves I guess?
Still working on adapting my code to use RcppSimdJson
& benchmarking, so another musing for the day --
I think a major choke point of my current code is some regular gc()
s that are happening, which skipping rawToChar
could potentially avert. IINM I am getting rawToChar
on a huge string on every batch from GET
, then parsing out my data & "discarding" the rather large strings (consisting of JSON objects with maybe 100s or rows and/or columns) which are now in the session's string cache (since rawToChar
will do mkChar
).
It's also something to keep in mind for benchmarking -- unless this phenomenon is captured, the benefit of dropping rawToChar
might be understated.
Done in #36