kffl / speedbump

TCP proxy for simulating variable, yet predictable network latency :globe_with_meridians::hourglass_flowing_sand:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding symmetrical delay (feature request)

n1000 opened this issue · comments

Hi. First off, thanks for sharing this utility, it has been great for testing out the netcode in my hobby game project.

One thing I noticed when performing testing was that the delay seems to be added in just a single direction, leading to highly asymmetric one way delays.

For example, with a 500ms latency setting connecting two netcat endpoints:

Node A: nc localhost 2500
Node B: speedbump --latency=500ms --port=2500 localhost:5555
Node C: nc -l -p 5555

I see that from A->C, ~500ms latency is added (confirmed with wireshark looking at the loopback interface). However, from C->A there seems to be no additional latency added.

Is there a mode where I can add symmetrical, 250ms of latency in each direction for a total round trip time of 500ms?

In practice the forward one way delay and backwards one way delay would not be exactly equal. In "extreme" examples, they may be quite different (for instance on satellite internet, or a home internet connection heavily loaded in just one direction). However, under good network connections, I think having the latency be roughly symmetric is somewhat reasonable...

Do you think this would be a generally useful option to have? would love to hear your thoughts on it.

I ran the above test using the code at git commit a556c99.

Hi @n1000,

Thanks for reporting this issue. I'm happy to hear that you have found this program useful for your project.

You are right, the delay queue in which the read buffers are stored is only used in the client -> proxy destination direction, not the other way around. This is by design, as the primary use case that I had in mind when developing this project was to aid testing of application metrics collection solutions (i.e. Prometheus + Grafana) by introducing very predictable, artificial latency. Here is a brief description of my motivation and use cases:

When setting up application metrics collection and visualization (i.e. via Prometheus + Grafana), I've often found myself trying to introduce artificial latency within the instrumented system for the purpose of generating more interesting timeseries data to test a given monitoring solution. Even when running load tests against an instrumented system, the data plotted on Grafana dashboards was often rather boring, making it difficult to catch bugs in PromQL queries due to lack of immediate visual feedback. I figured that one way of adding predictable variability to the instrumented application’s metrics, would be to introduce variable latency between it and its upstream services (i.e. a databases, message brokers or other services called synchronously).

Example use case:

Imagine that you have instrumented your app using Prometheus client so that it collects latency of DB queries in a histogram and that you are now building a Grafana dashboard to visualize these metrics. If you knew that the DB query latency over time should form a sine wave with a period of 2 mins and amplitude of 10 ms, it would be much easier to validate the correctness of metrics collection and visualization (you would know what to look for on the latency histogram/grah).

Introducing a delay queue in both directions may not be desirable in the use case that I've described, as the data being sent from the client to the server could obtain a different latency value than data being sent back from the server to the client. Consequently, the added end2end latency from the client's point of view would look like a sum of two latency graphs (i.e. sine or sawtooth waves) with a given phase applied to one of them (value of which depends on the processing time of the server), which is nowhere near as predictable as a single latency wave.

Having said that, there likely are other use cases for speedbump, in which such behaviour would be desirable. My proposal would be to add a feature implementing two additional latency generation modes, so that there are three in total:

  • only source->destination latency (currently implemented)
  • only destination->source latency
  • both source->destination and destination->source

Also, I should have described the default behaviour that you have encountered in the project's README. Since the documentation could use some improvements, I've added a separate issue to address that (#23).

Hi @kffl , your proposal sounds great. Thanks for taking the time to respond to this feature request.