Managing large messages

Question

Managing large messages

a412h opened this issue 2 months ago · comments

I am using the example at:
https://github.com/warmcat/libwebsockets/blob/main/minimal-examples-lowlevel/ws-client/minimal-ws-client-binance/main.c
but for a different use case.
Everything works as expected (messages are correctly received), but as large messages are split and as I need to use a parser (like simdjson), I get unclosed string errors from the parser (so this is not an error from libwebsockets).
What is the recommended efficient approach here ?
Should I increase the buffer size to avoid splitting messages, and how and where ?

Andy Green · Answer 1 · Sat May 18 2024 04:29:57 GMT+0800 (China Standard Time)

Lws approach is to deliver message contents to user code as they arrive; it arrives in units of tcp frames no matter what the logical fragmentation at ws layer. That's in contrast to, eg, js ws way which is to buffer the message contents until the last part arrives. But another way to say the js approach of atomic messages is that it adds latency before the user code can start processing the earliest data; you have to put what you have aside and wait until the whole thing arrived before starting on it.

If your json parser is stateful rather than needs to be given the whole thing at the end, you can start finding out the meaning of the message as it is arriving. Then you are going to get results at lower latency, even though this incremental, stateful parser may lack any simd woo.

a412_h · Answer 2 · Sat May 18 2024 05:08:23 GMT+0800 (China Standard Time)

Thanks for the insight. So, would the following approach be efficient:

copy the parts of the message as they arrive, for example in a string, that would be incrementally appended by each part of the message,
when the end of message character appears, the message is considered complete,
then the string is reset, and go back to 1.

And which parser would you recommend for the incremental approach of Lws ?

Andy Green · Answer 3 · Sat May 18 2024 06:40:08 GMT+0800 (China Standard Time)

If you do process the parts incrementally (it means the processing is 'stateful', because it can pause processing on any character boundary) you do not need any of those steps of copying and appending, because you don't need all the message in memory at the same time any more. You process what just came and throw it away, keeping just the interesting parts, or triggering behaviours from the processing as they happened. In fact it's no problem to process huge or endless JSON messages that wouldn't even fit in your memory all at once.

Lws includes a stateful JSON parser, you show it what just came and it fires callbacks as it can parse out JSON elements

https://github.com/warmcat/libwebsockets/blob/main/include/libwebsockets/lws-lejp.h

You can see worked test cases here

https://github.com/warmcat/libwebsockets/blob/main/minimal-examples-lowlevel/api-tests/api-test-lejp/main.c

and a tool to decompose JSON into callbacks from the commandline

https://github.com/warmcat/libwebsockets/blob/main/test-apps/test-lejp.c