README.md - timings and maxrss
pkoppstein opened this issue · comments
On the README.md page, some comparisons with jq are made.
Since jq also has a streaming parser, I believe it would be
helpful to compare the performance of the two streaming parsers.
It would also be helpful to add something about memory utilization;
since my /usr/bin/time gives maxrss, I've given that below.
Here are the results of my timings on a 3GHz 16MB RAM machine:
(i) jj 'features.10000.properties.LOT_NUM' -i citylots.json
091
user 0.01s
sys 0.14s
maxrss 197627904
(ii) jq-1.5 -n --stream 'first(inputs | select(.[0] == ["features",10000,"properties","LOT_NUM"])) | .[1]' citylots.json
"091"
user 0.60
sys 0.00
maxrss 2084864
(iii) As above but with jq-1.6rc1
"091"
user 0.61
sys 0.00
maxrss 2072576
Hi pkoppstein, I'm looking into these issues. I'm believe jj is buffering too much data prior to processing and low memory systems suffer when dealing with large json files. I'll look asap and keep you posted. Thanks!
@tidwall - No doubt jj's memory utilization could be improved but please understand that that was not the point of this "issue". I was just suggesting that (a) an apples-to-apples comparison with jq would be appropriate on the README page (i.e., using the streaming parser in both cases); and (b) some empirical information about memory utilization would also be helpful.
Speaking of documentation, some information about the intended behavior of jj on "wonky" JSON would also be helpful. If the intent is that jj behavior on what I call quasi-JSON is undefined, then so be it :-)
Meanwhile, I'm really impressed that jj is so fast on valid JSON !