schweikert / fping

High performance ping tool

Home Page:https://fping.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

not enough sequence numbers available! (expire_timeout=10000000000, host_nr=0, ping_count=0, seqmap_next_id=0)

cagney opened this issue · comments

On FreeBSD (14.0, but also 13.x), fping gets this error:

+# fping  -c 1  --timeout 1s   --src 192.0.1.254 192.0.2.254
+fping error: not enough sequence numbers available! (expire_timeout=10000000000, host_nr=0, ping_count=0, seqmap_next_id=0)

when it is run immediately after a boot (and I really mean immediately, its from a test framework). Adding a sleep 10 before running fping seems to fix the problem.

I suspect:

    /* check if expired (note that unused seqmap values will have fields set to
     * 0, so will be seen as expired */
    next_value = &seqmap_map[seqmap_next_id];
    if (timestamp - next_value->ping_ts < SEQMAP_TIMEOUT_IN_NS) {
        fprintf(stderr, "fping error: not enough sequence numbers available! (expire_timeout=%" PRId64 ", host_nr=%d, ping_count=%d, seqmap_next_id=%d)\n",
            SEQMAP_TIMEOUT_IN_NS, host_nr, ping_count, seqmap_next_id);
        exit(4);
    }

where, immediately after a boot timestamp is small which means timestamp-0/*ping_ts*/ is less than SEQMAP_TIMEOUT_IN_NS i.e., 10s if my math is correct. The fix would be to set .ping_ts to some equivalent of the epoc.

I also suspect #217 was wrong.

Unfortunately I cannot reproduce the error, but I think it is related to the jumping time mentioned by hmh in the pull-request, which can occur with CLOCK_REALTIME.
Presumably NTP is used on the system?

The patch is unfortunately not very clean in this respect, but should primarily correct the time output, which it does.
Unfortunately, it seems to generate unwanted subsequent errors in certain constellations.

I'll take a look at the whole thing and look for a better solution. HMH has already mentioned an approach here

and use the CLOCK_MONOTONIC delta + CLOCK_REALTIME timestamp to calculate a more sane real time that doesn't jump around.

Whereby the time output also becomes less accurate at some point with a longer runtime of the system, but the measurement result is not falsified.

Can you please check whether the problem is also present with the version under a3d991b ?
Of course, this is not the final version. I just want to know whether the error is gone.

Unfortunately I cannot reproduce the error,

Part my fault.

It turns out that the just released FreeBSD 14 hasn't updated their fping package (it's still 5.0) so it doesn't include the change for #203. And that change hides the problem with timestamp - next_value->ping_ts < SEQMAP_TIMEOUT_IN_NS because the timestamp is never close to zero.

I added pull request #290 as a possible solution.

As a different way to avoid problems with CLOCK_MONOTONIC starting with low values, i.e., below SEQMAP_TIMEOUT_IN_NS, on some operating systems, could we not initialize all ping_ts fields in the seqmap_map to -SEQMAP_TIMEOUT_IN_NS in seqmap_init()?

Of course, that would not address the problem that CLOCK_MONOTONIC is not useful for reporting of "real" time values on OpenBSD, FreeBSD, and macOS. Using CLOCK_REALTIME for, e.g., -D, --timestamp output, and CLOCK_MONOTONIC for time deltas would still be required to make CLOCK_MONOTONIC usable on those operating systems.

@auerswal your suggestion is the better solution and should prevent the error message under FreeBSD directly after the system boot.
I have implemented this in pull request #306

I think there is an efficiency tradeoff in two possible solutions to the spurious not enough sequence numbers available error:

  1. We can initialize the complete seqmap data structure's ping_ts up front to -SEQMAP_TIMEOUT_IN_NS. This always takes the same amount of extra work independent of the number of values later stored in the data structure while fping runs.
  2. We can add a != 0 test to the code checking if the next seqmap entry can be used, as proposed by @cagney as part of pull request #290. Initially, before all seqmap entries have been used, the != 0 check suffices, and one arithmetic operation is theoretically avoided. After all 65Ki seqmap entries have been used, this extra check always fails. For short fping runs using only a few pings, this is probably more efficient than initializing the whole data structure with a complicated value (i.e., calloc() does not suffice). With super-scalar out-of-order CPUs, both checks may even run in parallel, making the extra != 0 test practically free.

Both solutions avoid spurious not enough sequence numbers available errors, and the second one might be more efficient, but I am fine with both approaches.

That's right, here are the different CPU usage times

  1. [DEBUG] CPU time used: 0.001433 sec
  2. [DEBUG] CPU time used: 0.000466 sec

The debug output has been extended with the following commit gsnw@2b588c1

@schweikert you can use one of the pull requests #306 or #307.
Preferably use #307, because the other one is a bit worse in terms of CPU time.
I will close the remaining pull request afterwards.

@schweikert The issue can be closed as solved