CBOR vs. JSON performance -- why is CBOR.encode so much slower than JSON.stringify?

Question

CBOR vs. JSON performance -- why is CBOR.encode so much slower than JSON.stringify?

emily785 opened this issue a year ago · comments

In my tests CBOR.encode is around ~3x slower than JSON.stringify, but CBOR.decode is ~3.7x faster.

Would be incredible if CBOR.encode could achieve similar performance as JSON.stringify.

Has anyone looked into this more closely? I've tried various json objects and the numbers are around the same every time.

json str size=130217
1000 iterations

JSON.parse: 1610 ms
JSON.stringify: 422 ms

CBOR.decode: 421 ms
CBOR.encode: 1375 ms

Sami Vaarala · Answer 1 · Wed Sep 06 2023 21:59:12 GMT+0800 (China Standard Time)

With the JSON fast path enabled the JSON encoding is relatively well optimized, CBOR is not refined to such an extent which explains some of the difference.

In principle it should be trivial to CBOR encode strings, but since CBOR strings need to be pure UTF-8 there's an "is this valid UTF-8" check for the strings during encoding. This is now a string scan before the encode can proceed. It should be possible to optimize this to match JSON.stringify performance.

However, in master there's (soon) an even simpler fix: with WTF-8 support duk_hstring will soon have a flag which indicates whether the string is pure UTF-8 or needs WTF-8 extensions (= unpaired surrogates). The string scan can thus soon be removed which should make it very fast.

emily785 · Answer 2 · Thu Sep 07 2023 04:47:50 GMT+0800 (China Standard Time)

My test object actually contained a fair amount of strings.

Thanks for the information, looked into duk__cbor_encode_string_top and I noticed you already have some test options DUK_CBOR_TEXT_STRINGS / DUK_CBOR_BYTE_STRINGS. Looks like I will be able to make it skip the utf8 check for my usage using one of these #defines.

I have to say, I'm really looking forward to future duktape updates. I hope you are doing well. Is there any way to donate to show support?

Sami Vaarala · Answer 3 · Sun Sep 10 2023 20:39:31 GMT+0800 (China Standard Time)

If you happen to test CBOR performance with DUK_CBOR_{TEXT,BYTE}_STRINGS enabled, it'd be nice to know how much impact that has. The same level of overhead would be eliminated after the "string is UTF8" flag is added to duk_hstring.

I have to say, I'm really looking forward to future duktape updates. I hope you are doing well. Is there any way to donate to show support?

Thanks! It's been a bit difficult finding time for Duktape in the past few years but slowly things are getting better. There's no active donation method right now, but a good way to give support is to provide concrete, reproducible and actionable issues and pulls :-)

emily785 · Answer 4 · Sun Sep 17 2023 01:05:18 GMT+0800 (China Standard Time)

My test json:
test.txt

100k iterations
Ran a few times
Varies a bit each time, computer stuff I guess. The results below is the general trend.
Fastest encode was with DUK_CBOR_BYTE_STRINGS but then decode becomes very slow for some reason.

I haven't looked any closer. I will try to research it properly one day.

Luckily I mostly use decode in my project, and that beats JSON so I'm happy.. but hopeflly CBOR encode can be as fast as JSON stringify some day, or close to it.

#define DUK_CBOR_DECODE_FASTPATH
JSON.stringify: 486 ms.
JSON.parse: 1804 ms.
CBOR.encode: 1726 ms.
CBOR.decode: 529 ms.

#define DUK_CBOR_TEXT_STRINGS
JSON.stringify: 488 ms.
JSON.parse: 1798 ms.
CBOR.encode: 1688 ms.
CBOR.decode: 532 ms.

#define DUK_CBOR_BYTE_STRINGS
JSON.stringify: 499 ms.
JSON.parse: 1801 ms.
CBOR.encode: 1586 ms.
CBOR.decode: 2651 ms. (???)

Sami Vaarala · Answer 5 · Thu Sep 21 2023 06:48:40 GMT+0800 (China Standard Time)

Thanks for the measurements 👍

Fastest encode was with DUK_CBOR_BYTE_STRINGS but then decode becomes very slow for some reason.

This is probably because when decoding back, CBOR byte strings will decode into Uint8Array objects which are much heavier than strings.