amphp / process

An async process dispatcher for Amp.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Moderate to high concurrency load causes "Failed to launch process" errors on Windows

Bilge opened this issue · comments

commented

Using Artax (composer require amphp/artax) and running the following code on Windows:

        Loop::run(function () {
            foreach (range(1, 1000) as $id) {
                $this->client->request('http://example.com')->onResolve(function ($error, $response) {
                    if ($error) {
                        echo "$error\n";

                        return;
                    }
                });
            }
        });

Generates the following error:

Amp\Process\ProcessException: Process did not connect to server before timeout elapsed in vendor\amphp\process\lib\Internal\Windows\SocketConnector.php:344

The key is to have a high concurrency limit (i.e. 1000 in this case). Lower limits can trigger trigger it, too, but they need to be at least in the region of ~500 on my Windows 7 system.

The error is strange because only a couple of seconds are passing, at most, before the errors start streaming in. The same code works (can successfully download a response) with lower concurrency limits (e.g. around 200 or so).


Edit: After further testing, I can trigger this with a limit as low as 50. It seems to depend on how long the destination server takes to respond. Loading larger pages is more likely to cause it to fail. Again, only a second or so of real user time is passing but even trying to load 50 heavy URLs at once will cause it to fail the majority of the time and there seems to be no way to override the timeout.

The number of concurrent requests shouldn't affect this, nor should the size of the responses. Another process should only be used for reading the hosts file on Windows and for reading several registry entries for the DNS configuration.

commented

The number of concurrent requests shouldn't affect this

I'm sure you're correct, but empirically you're wrong. If $limit is the upper bound of the range() then I can tell you I've clearly observed that:

  • For 1 < $limit < 10 the code never times out
  • For 50 < $limit < 200 the code sometimes times out
  • For $limit === 1000 the code always times out

So whether it betrays your expectations or not, the number of concurrent requests directly affects this. Moreover, changing the timeout from one second to 30 seconds, as in the linked PR, causes all three cases to pass without timeout errors.

I'm just referring to what the code should do, not what it will do. I didn't say I'm sure it's bug-free. It can totally be that too many processes are spawned for example, and in that case such a timeout is pretty much expected.

Could you put some logging code into the runner and see which commands are launched?

commented

Could you put some logging code into the runner and see which commands are launched?

I'm not sure what you're asking me to do. Could you be more specific?

Put e.g. fwrite(STDERR, "command: $command\r\n"); in the line after the line I linked and run the reproduction script.

commented

One important thing I should point out is that I just tried running the script with Xdebug disabled and it passes for $limit circa 500. It is possible that Xdebug is the cause of this issue.


Edit: Never mind. $limit == 1000 still times out, even with Xdebug turned off. Disabling Xdebug seems to afford it more "bandwidth" but does not fix the issue.

commented

@kelunik The only thing that shows up when it fails due to every request timing out is the following, just after the first request is made.

command: reg query "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"

limit108.txt


In the case that the $limit is low enough for the requests to pass and not time out, the following additional output is shown, after all the requests have been queued and before the first response is received:

command: reg query "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
command: "C:\PHP 7.1\php.exe" -dhtml_errors=0 -ddisplay_errors=0 -dlog_errors=1 "vendor\amphp\parallel\lib\Context/Internal/process-runner.php" "vendor\amphp\parallel\lib\Worker/Internal/worker-process.php" "Amp\Parallel\Worker\BasicEnvironment"

limit54.txt

@Bilge Does increasing the timeout to 3 seconds already work for very high concurrency loads?

commented

Yes, but only up to a point. I can get ~500 concurrent with a 3 second timeout, but not ~1000.

@Bilge Do you have a profiler like Blackfire available? If so, could you please share a profile with 1 and 1000 requests?

commented

Happy to help in any way I can but I don't know how to do that. I understand most cool hackers don't use Windows, but there's only so much I can do via GitHub comments. Ultimately an expert such as yourself is probably going to need direct access to Windows to get to the bottom of a fundamentally complex or esoteric issue such as this.

See https://blackfire.io/docs/introduction for an introduction. If you can't / don't want to do that, that's entirely fine. Just be aware that it will delay this issue's resolution. I think you can do similar things with xdebug.

It depends a lot on the root cause of the issue. If it's an issue in the process wrapper, @DaveRandom will indeed need to look into this. If it's not, a profile might be enough to tell where the issue is.