zeek / broker

Zeek's Messaging Library

Home Page:https://docs.zeek.org/projects/broker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Websocket API: messages truncated to 4KB

simeonmiteff opened this issue · comments

Broker's websocket API seems to be truncating incoming JSON messages to 4KB.

Using this zeek script:

redef exit_only_after_terminate = T;
redef Broker::disable_ssl = T;

global test_topic = "/topic/test";

export
    {
    global ping: event(v: vector of count);
    global pong: event(v: vector of count);
    }

event ping(v: vector of count)
    {
    print fmt("receiver got ping with %d counts", |v|);
    local e = Broker::make_event(pong, v);
    Broker::publish(test_topic, e);
    }

event zeek_init()
    {
    Broker::listen_websocket();
    Broker::subscribe(test_topic);
    }

And testing by hand with websocat and a 4097 byte JSON encoding of an event:

$ websocat ws://localhost:9997/v1/messages/json
[]
{"type": "ack", "endpoint": "0f19c9e4-f296-5ba6-b32d-a98bcd1443af", "version": "2.5.0-dev"}
{"@data-type":"vector","data":[{"@data-type":"count","data":1},{"@data-type":"count","data":1},{"@data-type":"vector","data":[{"@data-type":"string","data":"ping"},{"@data-type":"vector","data":[{"@data-type":"vector","data":[{"@data-type":"count","data":100000},{"@data-type":"count","data":100001},{"@data-type":"count","data":100002},{"@data-type":"count","data":100003},{"@data-type":"count","data":100004},{"@data-type":"count","data":100005},{"@data-type":"count","data":100006},{"@data-type":"count","data":100007},{"@data-type":"count","data":100008},{"@data-type":"count","data":100009},{"@data-type":"count","data":100010},{"@data-type":"count","data":100011},{"@data-type":"count","data":100012},{"@data-type":"count","data":100013},{"@data-type":"count","data":100014},{"@data-type":"count","data":100015},{"@data-type":"count","data":100016},{"@data-type":"count","data":100017},{"@data-type":"count","data":100018},{"@data-type":"count","data":100019},{"@data-type":"count","data":100020},{"@data-type":"count","data":100021},{"@data-type":"count","data":100022},{"@data-type":"count","data":100023},{"@data-type":"count","data":100024},{"@data-type":"count","data":100025},{"@data-type":"count","data":100026},{"@data-type":"count","data":100027},{"@data-type":"count","data":100028},{"@data-type":"count","data":100029},{"@data-type":"count","data":100030},{"@data-type":"count","data":100031},{"@data-type":"count","data":100032},{"@data-type":"count","data":100033},{"@data-type":"count","data":100034},{"@data-type":"count","data":100035},{"@data-type":"count","data":100036},{"@data-type":"count","data":100037},{"@data-type":"count","data":100038},{"@data-type":"count","data":100039},{"@data-type":"count","data":100040},{"@data-type":"count","data":100041},{"@data-type":"count","data":100042},{"@data-type":"count","data":100043},{"@data-type":"count","data":100044},{"@data-type":"count","data":100045},{"@data-type":"count","data":100046},{"@data-type":"count","data":100047},{"@data-type":"count","data":100048},{"@data-type":"count","data":100049},{"@data-type":"count","data":100050},{"@data-type":"count","data":100051},{"@data-type":"count","data":100052},{"@data-type":"count","data":100053},{"@data-type":"count","data":100054},{"@data-type":"count","data":100055},{"@data-type":"count","data":100056},{"@data-type":"count","data":100057},{"@data-type":"count","data":100058},{"@data-type":"count","data":100059},{"@data-type":"count","data":100060},{"@data-type":"count","data":100061},{"@data-type":"count","data":100062},{"@data-type":"count","data":100063},{"@data-type":"count","data":100064},{"@data-type":"count","data":100065},{"@data-type":"count","data":100066},{"@data-type":"count","data":100067},{"@data-type":"count","data":100068},{"@data-type":"count","data":100069},{"@data-type":"count","data":100070},{"@data-type":"count","data":100071},{"@data-type":"count","data":100072},{"@data-type":"count","data":100073},{"@data-type":"count","data":100074},{"@data-type":"count","data":100075},{"@data-type":"count","data":100076},{"@data-type":"count","data":100077},{"@data-type":"count","data":100078},{"@data-type":"count","data":100079},{"@data-type":"count","data":100080},{"@data-type":"count","data":100081},{"@data-type":"count","data":100082},{"@data-type":"count","data":100083},{"@data-type":"count","data":100084},{"@data-type":"count","data":100085},{"@data-type":"count","data":100086},{"@data-type":"count","data":100087},{"@data-type":"count","data":100088},{"@data-type":"count","data":100089},{"@data-type":"count","data":100090},{"@data-type":"count","data":100091},{"@data-type":"count","data":100092},{"@data-type":"count","data":100093},{"@data-type":"count","data":100094},{"@data-type":"count","data":100095},{"@data-type":"count","data":100096},{"@data-type":"count","data":100097},{"@data-type":"count","data":100098},{"@data-type":"count","data":100099},{"@data-type":"count","data":100100},{"@data-type":"count","data":100101},{"@data-type":"count","data":10010200000000}]}]}]}],"topic":"/topic/test","type":"data-message"}
{"type": "error", "code": "deserialization_failed", "context": "input #8 contained malformed JSON -> caf::pec::unexpected_eof(2, 2)"}

Removing one of the zeros from the last count brings it down to 4096 bytes:

{"@data-type":"vector","data":[{"@data-type":"count","data":1},{"@data-type":"count","data":1},{"@data-type":"vector","data":[{"@data-type":"string","data":"ping"},{"@data-type":"vector","data":[{"@data-type":"vector","data":[{"@data-type":"count","data":100000},{"@data-type":"count","data":100001},{"@data-type":"count","data":100002},{"@data-type":"count","data":100003},{"@data-type":"count","data":100004},{"@data-type":"count","data":100005},{"@data-type":"count","data":100006},{"@data-type":"count","data":100007},{"@data-type":"count","data":100008},{"@data-type":"count","data":100009},{"@data-type":"count","data":100010},{"@data-type":"count","data":100011},{"@data-type":"count","data":100012},{"@data-type":"count","data":100013},{"@data-type":"count","data":100014},{"@data-type":"count","data":100015},{"@data-type":"count","data":100016},{"@data-type":"count","data":100017},{"@data-type":"count","data":100018},{"@data-type":"count","data":100019},{"@data-type":"count","data":100020},{"@data-type":"count","data":100021},{"@data-type":"count","data":100022},{"@data-type":"count","data":100023},{"@data-type":"count","data":100024},{"@data-type":"count","data":100025},{"@data-type":"count","data":100026},{"@data-type":"count","data":100027},{"@data-type":"count","data":100028},{"@data-type":"count","data":100029},{"@data-type":"count","data":100030},{"@data-type":"count","data":100031},{"@data-type":"count","data":100032},{"@data-type":"count","data":100033},{"@data-type":"count","data":100034},{"@data-type":"count","data":100035},{"@data-type":"count","data":100036},{"@data-type":"count","data":100037},{"@data-type":"count","data":100038},{"@data-type":"count","data":100039},{"@data-type":"count","data":100040},{"@data-type":"count","data":100041},{"@data-type":"count","data":100042},{"@data-type":"count","data":100043},{"@data-type":"count","data":100044},{"@data-type":"count","data":100045},{"@data-type":"count","data":100046},{"@data-type":"count","data":100047},{"@data-type":"count","data":100048},{"@data-type":"count","data":100049},{"@data-type":"count","data":100050},{"@data-type":"count","data":100051},{"@data-type":"count","data":100052},{"@data-type":"count","data":100053},{"@data-type":"count","data":100054},{"@data-type":"count","data":100055},{"@data-type":"count","data":100056},{"@data-type":"count","data":100057},{"@data-type":"count","data":100058},{"@data-type":"count","data":100059},{"@data-type":"count","data":100060},{"@data-type":"count","data":100061},{"@data-type":"count","data":100062},{"@data-type":"count","data":100063},{"@data-type":"count","data":100064},{"@data-type":"count","data":100065},{"@data-type":"count","data":100066},{"@data-type":"count","data":100067},{"@data-type":"count","data":100068},{"@data-type":"count","data":100069},{"@data-type":"count","data":100070},{"@data-type":"count","data":100071},{"@data-type":"count","data":100072},{"@data-type":"count","data":100073},{"@data-type":"count","data":100074},{"@data-type":"count","data":100075},{"@data-type":"count","data":100076},{"@data-type":"count","data":100077},{"@data-type":"count","data":100078},{"@data-type":"count","data":100079},{"@data-type":"count","data":100080},{"@data-type":"count","data":100081},{"@data-type":"count","data":100082},{"@data-type":"count","data":100083},{"@data-type":"count","data":100084},{"@data-type":"count","data":100085},{"@data-type":"count","data":100086},{"@data-type":"count","data":100087},{"@data-type":"count","data":100088},{"@data-type":"count","data":100089},{"@data-type":"count","data":100090},{"@data-type":"count","data":100091},{"@data-type":"count","data":100092},{"@data-type":"count","data":100093},{"@data-type":"count","data":100094},{"@data-type":"count","data":100095},{"@data-type":"count","data":100096},{"@data-type":"count","data":100097},{"@data-type":"count","data":100098},{"@data-type":"count","data":100099},{"@data-type":"count","data":100100},{"@data-type":"count","data":100101},{"@data-type":"count","data":1001020000000}]}]}]}],"topic":"/topic/test","type":"data-message"}

...it works and the zeek script outputs receiver got ping with 103 counts.

Are you sure this isn't a limitation of your tool? I tried to reproduce this with a btest, but it's working for me. The test below produces a JSON that's about 12kb in size by creating an array with 200 count values and communication with Broker using Python:

# @TEST-GROUP: web-socket
#
# @TEST-PORT: BROKER_WEB_SOCKET_PORT
#
# @TEST-EXEC: btest-bg-run node "broker-node --config-file=../node.cfg"
# @TEST-EXEC: btest-bg-run recv "python3 ../recv.py >recv.out"
# @TEST-EXEC: $SCRIPTS/wait-for-file recv/ready 15 || (btest-bg-wait -k 1 && false)
#
# @TEST-EXEC: btest-bg-run send "python3 ../send.py"
#
# @TEST-EXEC: $SCRIPTS/wait-for-file recv/done 30 || (btest-bg-wait -k 1 && false)
# @TEST-EXEC: btest-diff recv/recv.out
#
# @TEST-EXEC: btest-bg-wait -k 1

@TEST-START-FILE node.cfg

broker {
  disable-ssl = true
}
topics = ["/test"]
verbose = true

@TEST-END-FILE

@TEST-START-FILE recv.py

import asyncio, websockets, os, time, json, sys


from datetime import datetime

ws_port = os.environ['BROKER_WEB_SOCKET_PORT'].split('/')[0]

ws_url = f'ws://localhost:{ws_port}/v1/messages/json'

async def do_run():
    # Try up to 30 times.
    connected  = False
    for i in range(30):
        try:
            ws = await websockets.connect(ws_url)
            connected  = True
            # send filter and wait for ack
            await ws.send('["/test"]')
            ack_json = await ws.recv()
            ack = json.loads(ack_json)
            if not 'type' in ack or ack['type'] != 'ack':
                print('*** unexpected ACK from server:')
                print(ack_json)
                sys.exit()
            # tell btest to start the sender now
            with open('ready', 'w') as f:
                f.write('ready')
            # the message must be valid JSON
            msg_json = await ws.recv()
            msg = json.loads(msg_json)
            # pretty-print to stdout (redirected to recv.out)
            print(json.dumps(msg, indent=2))
            # tell btest we're done
            with open('done', 'w') as f:
                f.write('done')
            await ws.close()
            sys.exit()
        except:
            if not connected:
                print(f'failed to connect to {ws_url}, try again', file=sys.stderr)
                time.sleep(1)
            else:
                sys.exit()
loop = asyncio.get_event_loop()
loop.run_until_complete(do_run())

@TEST-END-FILE

@TEST-START-FILE send.py

import asyncio, websockets, os, json, sys

ws_port = os.environ['BROKER_WEB_SOCKET_PORT'].split('/')[0]

ws_url = f'ws://localhost:{ws_port}/v1/messages/json'

msg_data = []

for i in range(200):
    msg_data.append({
        '@data-type': 'count',
        'data': i
    })

# Message with timestamps using the local time zone.
msg = {
    'type': 'data-message',
    'topic': '/test',
    '@data-type': "vector",
    'data': msg_data
}

async def do_run():
    async with websockets.connect(ws_url) as ws:
      await ws.send('[]')
      await ws.recv() # wait for ACK
      msg_str = json.dumps(msg)
      if len(msg_str) < 4096:
          raise RuntimeError('message is too short')
      await ws.send(msg_str)
      resp = await ws.recv()
      print(resp)
      await ws.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(do_run())

@TEST-END-FILE

Here's the generated JSON as text file: tmp.txt.

Small correction: the text file was actually from the print in the receiver after pretty-printing. However, even if I do pretty print at the sender (msg_str = json.dumps(msg, indent=2)) to make sure it's actually sending 12kb, it still works.

Hi @Neverlord

You're right, there is a problem with websocat truncating the message to 4KB, however it gets more interesting...

I was using websocat to describe the issue because I also found this was happening with my Golang broker WS API client library and I wanted to verify it with something independent of my stuff (assuming websocket is not broken, which it turns out to be). I confirmed your suspicion by capturing the exchange and dissecting it with wireshark (the zeek script disables broker SSL).

On the other hand, my Golang code is sending the whole message (also confirmed with wireshark) and instead of responding with an error, broker is just terminating the connection.

I have some sample code here: https://github.com/simeonmiteff/broker-issue-357 (you can build this with go build).

This is what it looks like with <4KB messages:

$ ./broker-issue-357 100
2023/05/28 17:25:52 connected to remote endpoint with UUID=55791c58-dae3-538e-95db-0019faa2ef24 version=2.5.0-dev
2023/05/28 17:25:52 sent 100 counts
2023/05/28 17:25:52 received 100 counts
2023/05/28 17:25:53 sent 100 counts
2023/05/28 17:25:53 received 100 counts
^C2023/05/28 17:25:53 received signal, shutting down websocket
2023/05/28 17:25:54 shutting down websocket

...and the zeek script:

$ zeek listen.zeek 
peer added, [id=f2646bce-396a-599d-bd6e-44721e5ef288, network=[address=::1, bound_port=47730/tcp]]
receiver got ping with 100 counts
receiver got ping with 100 counts
peer lost, [id=f2646bce-396a-599d-bd6e-44721e5ef288, network=[address=::1, bound_port=47730/tcp]]

And with >4KB messages:

$ ./broker-issue-357 
2023/05/28 17:24:42 connected to remote endpoint with UUID=2010a6e7-2145-50d3-90f7-ecc8f269f16e version=2.5.0-dev
2023/05/28 17:24:42 sent 200 counts
2023/05/28 17:24:42 websocket: close 1006 (abnormal closure): unexpected EOF

I can see broker closing the socket (strace output):

80303 accept(18, NULL, NULL)           = 22
980303 getsockopt(22, SOL_SOCKET, SO_SNDBUF, [2626560], [4]) = 0
980303 recvfrom(22, "GET /v1/messages/json HTTP/1.1\r\n"..., 4096, MSG_NOSIGNAL, NULL, NULL) = 201
980303 getsockname(22, {sa_family=AF_INET6, sin6_port=htons(9997), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
980303 getsockname(22, {sa_family=AF_INET6, sin6_port=htons(9997), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
980303 getpeername(22, {sa_family=AF_INET6, sin6_port=htons(48020), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
980303 getpeername(22, {sa_family=AF_INET6, sin6_port=htons(48020), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
980303 sendto(22, "HTTP/1.1 101 Switching Protocols"..., 129, MSG_NOSIGNAL, NULL, 0) = 129
980303 recvfrom(22, "\201\220\361\4r\325\252&]\241\236t\33\266\336p\27\246\205&/\337", 4096, MSG_NOSIGNAL, NULL, NULL) = 22
980303 sendto(22, "\201[{\"type\": \"ack\", \"endpoint\": \"2"..., 93, MSG_NOSIGNAL, NULL, 0) = 93
980303 recvfrom(22, "\1\376\20\0kk\200V\20I\3002\n\37\341{\37\22\3603IQ\242 \16\10\3649\31I\254t"..., 4096, MSG_NOSIGNAL, NULL, NULL) = 4096
980303 recvfrom(22, "I\10\357#\5\37\242z", 8, MSG_NOSIGNAL, NULL, NULL) = 8
980303 recvfrom(22, "\200\376\r\376\324(\237\334\366L\376\250\265\n\245\355\344\30\256\354\340U\263\247\366h\373\275\240I\262\250"..., 4104, MSG_NOSIGNAL, NULL, NULL) = 3590
980303 close(22)                        = 0
980296 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---

Your Python test suggests I have a different bug, but that would mean the message sent via the Golang library is not well-formed and is hitting some code path in broker that causes it to silently drop the connection (still a bug!) but then I would expect it to also not work for <4KB messages, but it works just fine...

Can you share the JSON that your Go program generates? That would make trying to reproduce this issue easier.

@Neverlord no problem: go.json.txt

This was copied out of wireshark, so I'm fairly certain it is exactly what broker sees.

Just an idea, since you already have looked at this in Wireshark: is the Go library sending this entire JSON as a single WebSocket frame? Broker expects a full JSON message per frame. So if your library splits up the messages into multiple frames, that could cause Broker to think it deals with a broken client.

@Neverlord indeed, the Go websocket library (gorilla/websocket) is sending the buffer as a single message in two frames (i.e., employing fragmentation). Here is the PCAP: go.pcap.gz

The websocket RFC requires fragmentation, so (respectfully) I'd say not supporting this in broker is a bug:

o Clients and servers MUST support receiving both fragmented and
unfragmented messages.

Meanwhile I'll see if there is a way to get gorilla/websocket not to use fragments.

Indeed gorilla/websocket can be configured to send larger frames and I can confirm that this allows me to send >4KB messages from Golang (by avoiding fragmentation), however this makes my broker library hard to use because the calling application has to know what the largest message will be at connection time.

The gorilla/websocket documentation has a section that describes this trade of quite well (note that frame size limits is not a consideration since fragmentation takes care of that):

The buffer sizes do not limit the size of a message that can be read or written by a connection.

Buffers are held for the lifetime of the connection by default. If the Dialer or Upgrader WriteBufferPool field is set, then a connection holds the write buffer only when writing a message.

Applications should tune the buffer sizes to balance memory use and performance. Increasing the buffer size uses more memory, but can reduce the number of system calls to read or write the network. In the case of writing, increasing the buffer size can reduce the number of frame headers written to the network.

Ah, I didn't even think about fragmented frames. But it seems like you're right, there's been a bug in how the WebSocket implementation handles opcodes of continuation frames. Can you try the branch topic/neverlord/gh-357? It should fix this bug.

@Neverlord the topic/neverlord/gh-357 branch (note version=2.7.0-dev) behaves the same as before:

$ ./broker-issue-357 200
2023/05/30 09:48:50 connected to remote endpoint with UUID=297b884d-118a-5376-a1ae-cfa61d5a519f version=2.7.0-dev
2023/05/30 09:48:50 sent 200 counts
2023/05/30 09:48:50 websocket: close 1006 (abnormal closure): unexpected EOF

I also think actor-framework/actor-framework@d3cfa1f may have introduced a new bug, I'm now seeing timestamp strings in the NetworkTimestamp metadata coming back from broker that aren't in the YYYY-MM-DDThh:mm:ss.sss format:

{
  "type": "data-message",
  "topic": "/topic/test",
  "@data-type": "vector",
  "data": [
    {
      "@data-type": "count",
      "data": 1
    },
    {
      "@data-type": "count",
      "data": 1
    },
    {
      "@data-type": "vector",
      "data": [
        {
          "@data-type": "string",
          "data": "pong"
        },
        {
          "@data-type": "vector",
          "data": [
            {
              "@data-type": "vector",
              "data": [
                {
                  "@data-type": "count",
                  "data": 100000
                }
              ]
            }
          ]
        },
        {
          "@data-type": "vector",
          "data": [
            {
              "@data-type": "vector",
              "data": [
                {
                  "@data-type": "count",
                  "data": 1
                },
                {
                  "@data-type": "timestamp",
                  "data": "2023-05-30T09:24:34.796706048+10:00"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

@Neverlord if it helps, I modified your btest python client script to send fragments (payload is now an encoded zeek event), and it does the same thing as the Golang code:

import asyncio, websockets, os, json, sys

ws_port = os.environ['BROKER_WEB_SOCKET_PORT'].split('/')[0]

ws_url = f'ws://localhost:{ws_port}/v1/messages/json'

msg_data = []

for i in range(200):
    msg_data.append({
        '@data-type': 'count',
        'data': i
    })

# Message with timestamps using the local time zone.
msg = {
    'type': 'data-message',
    'topic': '/topic/test',
    '@data-type': "vector",
    'data': [
                {"@data-type": "count", "data": 1},
                {"@data-type": "count", "data": 1},
                {"@data-type": "vector", "data": [
                    {"@data-type": "string", "data": "ping"},
                    {"@data-type": "vector", "data":[
                        {"@data-type": "vector", "data": msg_data},
                        ]
                    }
                    ]
                }
            ]
}

async def do_run():
    async with websockets.connect(ws_url) as ws:
      await ws.send('[]')
      await ws.recv() # wait for ACK
      msg_str = json.dumps(msg)
      if len(msg_str) < 4096:
          raise RuntimeError('message is too short')
      frames = []
      while len(msg_str) > 4096:
          frames.append(msg_str[:4096])
          msg_str = msg_str[4096:]
      if len(msg_str) > 0:
          frames.append(msg_str)
      await ws.send(frames)
      resp = await ws.recv()
      print(resp)
      await ws.close()

loop = asyncio.get_event_loop()

try:
    loop.run_until_complete(do_run())
except KeyboardInterrupt:
    pass

(...) may have introduced a new bug, I'm now seeing timestamp strings in the NetworkTimestamp metadata coming back from broker that aren't in the YYYY-MM-DDThh:mm:ss.sss format:

It's a deliberate change to have unambiguous timestamps with resolutions from seconds to nanoseconds, but let's discuss this someplace else. I've dropped the commit related to the timestamps from the branch, so it shouldn't affect the testing.

And sorry for not testing the backport properly. It should work now and I've also added the Python btest as a regression test to the branch. Mind trying again?

@Neverlord yup - that fixes it 🎉. Thank you!

Re: timestamp format, I note #346 and it makes sense, I was just unprepared for it 😁