aantron / dream

Tidy, feature-complete Web framework

Home Page:https://aantron.github.io/dream/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Receiving websocket payloads larger than ~4K yields garbled data

cemerick opened this issue · comments

Here's a minimal reproduction that lets one send arbitrarily-sized websocket payloads to the server; sizes > ~4K results in garbled (seemingly arbitrary) data:

let rec bus_loop ws =
  let open Lwt.Infix in
  receive ws
  >>= function
  | None     -> close_websocket ws
  | Some msg ->
    let open String in
    let len = Int.min 40 (length msg) in
    log "Message (%d) ending with: %s" (length msg) (sub msg (length msg - len) len);
    bus_loop ws

let () =
  Lwt_main.run
  @@ serve ~interface:"0.0.0.0" ~port:9090
       (router
          [ get "/ws" (fun _req -> websocket bus_loop)
          ; (get "/"
            @@ fun _r ->
            Dream.respond
              {|<html>
           websocket payload size:
           <input type="number" id="file" />
           <button>Send</button>
           <script>
           function send (input) {
             let ws = new WebSocket("ws://localhost:9090/ws");
             ws.addEventListener("open", e => {
               let bt = ws.binaryType;
               let len = parseInt(input.value);
               let data = new Uint8Array(len);
               data.fill(33, 0, len);
               ws.binaryType = 'arraybuffer';
               ws.send(data);
               ws.binaryType = bt;
             });
           }

           document.querySelector("button").addEventListener("click", e => send(document.querySelector("input")));
           </script>
           </html>|}
            )
          ])

A sample output log looks like this (note that the messages being sent by the client consist entirely of exclamation points, just to easily see where the bad data starts):

29.03.22 03:18:38.261                       Message (4000) ending with: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
29.03.22 03:18:44.169                       Message (4090) ending with: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!�9
29.03.22 03:21:34.808                       Message (5000) ending with: ����������������������������������������
29.03.22 03:21:37.178                       Message (5000) ending with: wXwXwXwXwXwXwXwXwXwXwXwXwXwXwXwXwXwXwXwX

Of course maybe there's another perfectly reasonable cause, it sure feels like the websocket facility is successfully handling the first "frame" of data (which is probably 4096 bytes, including some header data, yielding something like 4000 bytes of expected data), but then only increases the extent of the resulting string, without actually copying over the payload of websocket "frame" >= 2.

(BTW, I'll also gladly accept the news that I'm somehow running afoul of some necessary yet missing websocket handling 😟)

When I attempt to send much larger payloads (say, 20MB), more stranger things happen:

  1. a single payload sent (sometimes!) results in multiple messages received, as if the original single payload is not recognized as such
  2. the messages that are received have strange sizes, starting with one larger message, and then a pile of much smaller messages. Example output for that case:
29.03.22 13:11:51.406                       Message (131099) ending with: c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�
29.03.22 13:11:51.406                       Message (99) ending with: c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�
29.03.22 13:11:51.406                       Message (99) ending with: c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�c=,�

...with pages and pages of log indicating that, in this case, tons of 99-byte messages were received.

That larger message often seems to be something close to 131000 bytes, which seems just as suspiciously a round number as the ~4K threshold I noted in my original report. 🤔

(BTW, I'll also gladly accept the news that I'm somehow running afoul of some necessary yet missing websocket handling 😟)

This is almost certainly a problem in Dream or websocket/af. I unfortunately won't be able to look at this immediately, but I am reading the very helpful information and will try to address this soon.

I fished around in both dream and websocketaf hoping to track the specific cause of this down, but without ultimate success. Some tidbits that might be helpful though:

  1. When receiving messages > ~4K, Dream does not receive fragments from the underlying stream when it should; specifically, it sometimes requires 2+ top-level messages being sent before a deluge of fragments come through (with the garbled data as described before).
  2. Chrome's websocket messages panel sometimes reports Received a broken close frame containing a reserved status code. after a number of too-large messages are sent.

My hunch is that websocketaf is what's misbehaving. If Dream's streams were at fault, then things like large multipart form uploads should be breaking too; and that "broken close frame" report is probably not something that Dream's higher-level websocket support could provoke?

Until this is resolved, my workaround is to just add my own (3kb, yeuch lol) framing to known-large websocket messages. Obviously not ideal, but it seems to work well enough in the interim.

Edit: actually, doing my own framing doesn't work, either. As before, the correct amount of data is sent (in this case, per-3k message), but various parts of those "frames" contain AFAICT random data. Maybe there is a timing issue that is provoked by "large" amounts of data going through a websocket connection per some unit of time, which affects both large, multi-frame messages as well as large-ish numbers of smaller messages sent very quickly? I'm declaring defeat on my end at this point for now.

I've recently been debugging this and this appears to be an upstream Websocket/af issue, as probably also suggested by anmonteiro/httpun-ws#48, because this can be worked around by changing the buffer size in Websocket/af. As far as I can tell, Dream already receives garbled data for fragments after the first, and I don't think Dream is clobbering or can clobber Webscoket/af's internal buffer. So I don't see how Dream can fix the reassembly of these messages.

cc @anmonteiro

Yes, it's definitely a problem upstream of Dream, and @anmonteiro and I have sporadically gone back and forth on the subject. I think I currently owe him a minimal repro app, which I hope to get to assemble in the next week or so.

In case it's useful, here is the diff from my "instrumented" Dream that I used to observe the issue and check that Dream gets garbled buffers already from Websocket/af:

diff --git a/src/http/shared/dune b/src/http/shared/dune
index 31b3c06..7b9d528 100644
--- a/src/http/shared/dune
+++ b/src/http/shared/dune
@@ -5,6 +5,7 @@
   bigstringaf
   dream-pure
   dream-httpaf.dream-websocketaf
+  cstruct
  )
  (preprocess (pps lwt_ppx))
  (instrumentation (backend bisect_ppx)))
diff --git a/src/http/shared/websocket.ml b/src/http/shared/websocket.ml
index 358d577..fab93a9 100644
--- a/src/http/shared/websocket.ml
+++ b/src/http/shared/websocket.ml
@@ -134,6 +134,7 @@ let websocket_handler stream socket =
             log "Unknown frame type with length %i" length); *)
           read ~data ~flush ~ping ~pong ~close ~exn
         | Some (`Data properties, payload) ->
+          prerr_endline "assigning current_payload";
           current_payload := Some (properties, payload);
           read ~data ~flush ~ping ~pong ~close ~exn
         end
@@ -141,6 +142,7 @@ let websocket_handler stream socket =
         Websocketaf.Payload.schedule_read
           payload
           ~on_read:(fun buffer ~off ~len ->
+            Printf.eprintf "on_read: %s %i\n%!" (Cstruct.to_string ~off ~len:16 (Cstruct.of_bigarray buffer)) len;
             match !last_chunk with
             | None ->
               last_chunk := Some (buffer, off, len);
@@ -150,6 +152,7 @@ let websocket_handler stream socket =
               let binary = binary = `Binary in
               data last_buffer last_offset last_length binary false)
           ~on_eof:(fun () ->
+            prerr_endline "clearning current_payload";
             current_payload := None;
             match !last_chunk with
             | None ->
diff --git a/src/pure/message.ml b/src/pure/message.ml
index e96da42..86422c0 100644
--- a/src/pure/message.ml
+++ b/src/pure/message.ml
@@ -315,6 +315,7 @@ let receive_fragment stream =
         in
         let text_or_binary = if binary then `Binary else `Text in
         let end_of_message = if fin then `End_of_message else `Continues in
+        Printf.eprintf "data prefix: %s\n%!" (String.sub string 0 16);
         Lwt.wakeup_later
           resolver (Some (string, text_or_binary, end_of_message)))
 
@@ -344,16 +345,20 @@ let receive_full stream =
     | None ->
       Lwt.return (Some (acc, text_or_binary))
     | Some (fragment, _, `End_of_message) ->
+      Printf.eprintf "End continuations %i %s\n%!" (String.length fragment) (String.sub fragment 0 16);
       Lwt.return (Some (acc ^ fragment, text_or_binary))
     | Some (fragment, _, `Continues) ->
+      Printf.eprintf "Interior continuation %i\n%!" (String.length fragment);
       receive_continuations text_or_binary (acc ^ fragment)
   in
   match%lwt receive_fragment stream with
   | None ->
     Lwt.return_none
   | Some (fragment, text_or_binary, `End_of_message) ->
+    Printf.eprintf "Full message %i\n%!" (String.length fragment);
     Lwt.return (Some (fragment, text_or_binary))
   | Some (fragment, text_or_binary, `Continues) ->
+    Printf.eprintf "First fragment %i %s\n%!" (String.length fragment) (String.sub fragment 0 16);
     receive_continuations text_or_binary fragment
 
 let receive stream =

This should apply over 313fe46. We were testing this with your repro case by opening a browser tab and asking it to send 4200 bytes to the server.

Commenting here for both this one and #230

The general situation of my websocketaf fork is that it's an experiment to check whether it could work. Unfortunately, and as you've noticed in Dream, it's clearly not in a state where it should be used at all right now (unlike h2/httpaf, which are running production workloads).

I suppose that the former doesn't matter when it's indeed used in Dream, and we should strive to make the library better.

Unfortunately I don't have any free time to spare right now on websocketaf – there seems to be other interest in using it too from other community folks. I'm happy to review PRs or otherwise talk about a paid engagement to make websocketaf ready for dream.

Hello,

I'm also hit by this bug. I'd be happy to help on it!

@anmonteiro commented here:

I believe I've addressed the source of garbled data,

However, I still see the problem occurring on dream (master branch), which seems to have included all commits from @anmonteiro's fork. Am I missing something that would have addressed the garbled data?

If the issue is still open, I would appreciate some pointers on how to investigate on this issue!