shadow / shadow

Shadow is a discrete-event network simulator that directly executes real application code, enabling you to simulate distributed systems with thousands of network-connected processes in realistic and scalable private network experiments using your laptop, desktop, or server running Linux.

Home Page:https://shadow.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large socket buffers with fifo queueing causes sockets to stall

stevenengler opened this issue · comments

If a host with a slow network has two sockets with large send buffers, one socket can prevent the other socket from sending data for a long period of time.

When the network interface can send a packet with fifo queueing (the default), it chooses the packet with the lowest priority. The packet priority is set when the application calls send(), meaning that data from earlier send()s will always be sent before data from later send()s. This can be a problem if the socket has a large send buffer which gets filled by the application. Any other sockets on the host will need to wait for that socket to send all of its packets before the other sockets can send their packets. In extreme cases, this can causes a socket to not send any packets for over 2 minutes. This is an issue with Shadow's implementation of fifo queueing and this issue also applies to UDP sockets.

For example, here we have two sockets on a slow host (up/down bandwidth of 500 Kibps). Each socket is given a 100 MiB send buffer, which we write a total of 10 MiB to. One socket will have all of its data prioritized over the other.

import socket
import sys
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# set send buffer length of 100 MiB
s.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 100*(1024**2))

s.connect((sys.argv[1], int(sys.argv[2])))

# send 10 MiB
size = 10*(1024**2)
buf = b'1'*size

while len(buf) != 0:
    num = s.send(buf)
    buf = buf[num:]

# try to work around https://github.com/shadow/shadow/issues/3100
time.sleep(10)

s.shutdown(socket.SHUT_WR)
assert b'' == s.recv(1)
s.close()
import asyncio
import socket
import sys

async def handle_client(client, addr):
    loop = asyncio.get_event_loop()

    while True:
        buf = await loop.sock_recv(client, 1024)
        if len(buf) == 0:
            break

    print("Connection closed:", addr)
    client.close()

async def run_server(port):
    server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server.bind(('0.0.0.0', port))
    server.listen(100)
    server.setblocking(False)

    loop = asyncio.get_event_loop()

    while True:
        client, addr = await loop.sock_accept(server)
        print("New connection:", addr)
        loop.create_task(handle_client(client, addr))

asyncio.run(run_server(int(sys.argv[1])))
general:
  stop_time: 360s

experimental:
  strace_logging_mode: standard

host_option_defaults:
  pcap_enabled: true

network:
  graph:
    type: gml
    inline: |
      graph [
        node [
          id 0
          host_bandwidth_down "500 Kbit"
          host_bandwidth_up "500 Kbit"
        ]
        edge [
          source 0
          target 0
          latency "50 ms"
        ]
      ]

hosts:
  server:
    network_node_id: 0
    processes:
    - path: /usr/bin/python3
      args: -u ../../../server.py 8080
      start_time: 0s
      expected_final_state: running
  client:
    network_node_id: 0
    processes:
    - path: /usr/bin/python3
      args: -u ../../../client.py server 8080
      start_time: 1s
    - path: /usr/bin/python3
      args: -u ../../../client.py server 8080
      start_time: 1s

If we look at the client's pcap, we can see that one of the sockets had a period of ~173 seconds where it didn't send any packets.

1691607980_grim

It was only able to send packets after the other socket had finished sending all of its packets.

1691608119_grim