stdio buffered writes (chunked) issues & process.exit() truncation

Question

stdio buffered writes (chunked) issues & process.exit() truncation

eljefedelrodeodeljefe opened this issue 8 years ago · comments

Robert Jefe Lindstädt commented 8 years ago

If this is currently breaking your program, please use this temporary fix:

[process.stdout, process.stderr].forEach((s) => {
  s && s.isTTY && s._handle && s._handle.setBlocking &&
    s._handle.setBlocking(true)
})

Version: v6, (likely all and backportable)
Platform: all
Subsystem: process

As noted in #6297 async stdio will not be flushed upon immediate process.exit(). This may lay open general deficiencies around C exit() from C++ functions not being properly unwound and is probably not just introduced by latest libuv updates. It should be considered to add flushing, providing graceful exit and/or improving unwinding C++ stacks.

cc @jasnell, @kzc, @Qix-, @bnoordhuis

Issues

Discussion has been already taking place at several places, e.g. #6297, #6456, #6379

Summaries of Proposals

proposals are not exclusive and could lead to semantically unrelated contributions.

aid with process.stdout.flush()
process.setBlocking(true)
node --blocking-stdio
longjmp() towards main at exit in C++
move parts of process.exit() / process.reallyExit() to new method os.exit()
golang panic()- or c++ throw-like stack unwinding

Discussions by Author (with content)

@ChALkeR
I tried to discuss this some time ago at IRC, but postponed it for quite a long time. Also I started the discussion of this in #1741, but I would like to extract the more specific discussion to a separate issue.

I could miss some details, but will try to give a quick overview here.

Several issues here:

Many calls to console.log (e.g. calling it in a loop) could chew up all the memory and die — #1741, #2970, #3171.
console.log has different behavior while printing to a terminal and being redirected to a file. — #1741 (comment).
Output is sometimes truncated — #6297, there were other ones as far as I remember.
The behaviour seems to differ across platforms.

As I understand it — the output has an implicit write buffer (as it's non-blocking) of unlimited size.

One approach to fixing this would be to:

Introduce an explicit cyclic write buffer.
Make writes to that cyclic buffer blocking.
Make writes from the buffer to the actual output non blocking.
When the cyclic buffer reaches it's maximum size (e.g. 10 MiB) — block further writes to the buffer until a corresponding part of it is freed.
On (normal) exit, make sure the buffer is flushed.

For almost all cases, except for the ones that are currently broken, this would behave as a non-blocking buffer (because writes to the buffer are considerably faster than writes from the buffer to file/terminal).

For cases when the data is being piped to the output too quickly and when the output file/terminal does not manage to output it at the same rate — the write would turn into a blocking operation. It would also be blocking at the exit until all the data is written.

Another approach would be to monitor (and limit) the size of data that is contained in the implicit buffer coming from the async queue, and make the operations block when that limit is reached.

Josh Junon · Answer 1 · Fri Apr 29 2016 06:50:25 GMT+0800 (China Standard Time)

Perhaps a list of issues this would address and/or close would be helpful to include since this seems to be a sprawling issue with a lot of fragmented discussion.

Robert Jefe Lindstädt · Answer 2 · Fri Apr 29 2016 06:52:36 GMT+0800 (China Standard Time)

Yes, just a little late in Europe :( keep 'em coming and I add them above.

Vse Mozhe Buty · Answer 3 · Fri Apr 29 2016 06:55:05 GMT+0800 (China Standard Time)

Also see #6410

Vse Mozhe Buty · Answer 4 · Fri Apr 29 2016 07:11:07 GMT+0800 (China Standard Time)

Considering all the clarification in the #6410, is there also a theoretical possibility that not only several I/O calls to stdout could not make it, but even one simple console.log() before process.exit() could be truncated or discarded?

Josh Junon · Answer 5 · Fri Apr 29 2016 07:13:23 GMT+0800 (China Standard Time)

@vsemozhetbyt that is especially correct if I'm understanding your question correctly.

Anna Henningsen · Answer 6 · Fri Apr 29 2016 07:14:34 GMT+0800 (China Standard Time)

@vsemozhetbyt If it’s big enough, definitely. See e.g. test/known_issues/test-stdout-buffer-flush-on-exit.js.

Robert Jefe Lindstädt · Answer 7 · Fri Apr 29 2016 07:18:20 GMT+0800 (China Standard Time)

To reproduce you can do

require('crypto').randomBytes(100000000, function(err, buffer) {
  var token = buffer.toString('hex');
  console.log(token);
  process.exit(0)
});

Edit: @addaleax's hint: test does a similar thing. Sorry @addaleax

kzc · Answer 8 · Fri Apr 29 2016 07:24:01 GMT+0800 (China Standard Time)

@vsemozhetbyt This output is truncated with node 6.0.0 on Mac after approx 40 lines:

node -e 'console.log("The quick brown fox jumps.\n".repeat(40000)); process.exit(7);'

node 5.x and earlier output all 40000 lines on Mac.

Vse Mozhe Buty · Answer 9 · Fri Apr 29 2016 07:44:18 GMT+0800 (China Standard Time)

So now if user does not want to reflow the code all one has is to write something like

const err = {name: 'Error', message: 'something wrong'};
throw err;

instead of

console.log('Error: something wrong');
process.exit(1);

and to deal with all the uncontrolled clutter of debug output?

kzc · Answer 10 · Fri Apr 29 2016 07:49:28 GMT+0800 (China Standard Time)

throw err;
and to deal with all the uncontrolled clutter of debug output?

For dev code, sure, but uncaught exceptions in production code is not very elegant or professional.

kzc · Answer 11 · Fri Apr 29 2016 08:07:08 GMT+0800 (China Standard Time)

Related: #6379

Also discusses process.stdio.setBlocking(Boolean)

Robert Jefe Lindstädt · Answer 12 · Fri Apr 29 2016 15:47:55 GMT+0800 (China Standard Time)

added @chalkers thread and updated this issue with some summaries and stuff.

Josh Junon · Answer 13 · Fri Apr 29 2016 16:09:35 GMT+0800 (China Standard Time)

@kzc

node 5.x and earlier output all 40000 lines on Mac.

Not sure what you're talking about.

#!/usr/bin/env bash
. ~/.nvm/nvm.sh

uname -a
echo

function do_buffer_test {
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n"));' | wc -l
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n")); process.exit(1)' | wc -l
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n")); process.reallyExit(1)' | wc -l
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n")); process.abort()' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!");' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!"); process.exit(1)' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!"); process.reallyExit(1)' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!"); process.abort()' | wc -l
}

nvm install 0.10
do_buffer_test

nvm install 0.12
do_buffer_test

nvm install 1
do_buffer_test

nvm install 2
do_buffer_test

nvm install 3
do_buffer_test

nvm install 4
do_buffer_test

nvm install 5
do_buffer_test

nvm install 6
do_buffer_test

$ ./test-buffers.sh
Darwin JunonBox.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64

v0.10.44 is already installed.
Now using node v0.10.44 (npm v2.15.0)
   40000
   40000
   40000
   40000
   40000
   40000
   40000
   40000
v0.12.13 is already installed.
Now using node v0.12.13 (npm v2.15.0)
   40000
   40000
   40000
   40000
   40000
   40000
   40000
   40000
iojs-v1.8.4 is already installed.
Now using io.js v1.8.4 (npm v2.9.0)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
iojs-v2.5.0 is already installed.
Now using io.js v2.5.0 (npm v2.13.2)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
iojs-v3.3.1 is already installed.
Now using io.js v3.3.1 (npm v2.14.3)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
v4.4.3 is already installed.
Now using node v4.4.3 (npm v2.15.1)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
v5.11.0 is already installed.
Now using node v5.11.0 (npm v3.8.6)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
v6.0.0 is already installed.
Now using node v6.0.0 (npm v3.8.6)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000

$ ./test-buffers.sh
Linux -snip- 3.18.27 #1 SMP Wed Feb 17 01:14:23 UTC 2016 x86_64 GNU/Linux

######################################################################## 100.0%
Now using node v0.10.44 (npm v2.15.0)
Creating default alias: default -> 0.10 (-> v0.10.44)
40000
40000
40000
40000
40000
40000
40000
40000
######################################################################## 100.0%
Now using node v0.12.13 (npm v2.15.0)
40000
40000
40000
40000
40000
40000
40000
40000
Downloading https://iojs.org/dist/v1.8.4/iojs-v1.8.4-linux-x64.tar.gz...
######################################################################## 100.0%
Now using io.js v1.8.4 (npm v2.9.0)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://iojs.org/dist/v2.5.0/iojs-v2.5.0-linux-x64.tar.xz...
######################################################################## 100.0%
Now using io.js v2.5.0 (npm v2.13.2)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://iojs.org/dist/v3.3.1/iojs-v3.3.1-linux-x64.tar.xz...
######################################################################## 100.0%
Now using io.js v3.3.1 (npm v2.14.3)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://nodejs.org/dist/v4.4.3/node-v4.4.3-linux-x64.tar.xz...
######################################################################## 100.0%
Now using node v4.4.3 (npm v2.15.1)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://nodejs.org/dist/v5.11.0/node-v5.11.0-linux-x64.tar.xz...
######################################################################## 100.0%
Now using node v5.11.0 (npm v3.8.6)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://nodejs.org/dist/v6.0.0/node-v6.0.0-linux-x64.tar.xz...
######################################################################## 100.0%
Now using node v6.0.0 (npm v3.8.6)
40000
2849
2849
2849
40000
40000
40000
40000

Looks to me whenever io.js forked is when this started happening. Perhaps @indutny can shed some light on the subject.

Robert Jefe Lindstädt · Answer 14 · Fri Apr 29 2016 18:15:25 GMT+0800 (China Standard Time)

Adding two other possibilities.

move parts of process.exit() / process.reallyExit() to new method os.exit(), which may fit what actually is happening better
golang panic()- or c++ throw-like stack unwinding.

kzc · Answer 15 · Fri Apr 29 2016 19:56:25 GMT+0800 (China Standard Time)

@Qix- @eljefedelrodeodeljefe Please understand that when you pipe the results you are changing the test. It follows a different code path in node and libuv. You have to observe it on the terminal. So in that regard, the test is more difficult to automate.

I am observing this behavior of Mac OS X 10.9.5. The behavior is different on Mac and Linux. Mac stdout appears to have had blocking writes to the tty historically. See #6297 (comment)

kzc · Answer 16 · Fri Apr 29 2016 20:03:57 GMT+0800 (China Standard Time)

+1 for new function os.exit(Boolean) that drains stdout/stderr upon exit and leave process.exit() as is.

Actually may have to leave process.exit as is because of the prevalence of workarounds to the stdout flushing problem such as node-exit which might break if the behavior of process.exit changes.

Saúl Ibarra Corretgé · Answer 17 · Fri Apr 29 2016 22:06:02 GMT+0800 (China Standard Time)

If memory serves right, and by looking at the results posted by @Qix- I think this is where things started to change: libuv/libuv@b197515 Because we started to open writable TTYs in non-blocking mode. Follow the commit trail for reasonin, reverting is not an option.

kzc · Answer 18 · Fri Apr 29 2016 22:15:25 GMT+0800 (China Standard Time)

reverting is not an option.

Yes, and that particular commit also introduced the tty redirection bug in src/unix/tty.c that was fixed in libuv 1.9.0.

Saúl Ibarra Corretgé · Answer 19 · Fri Apr 29 2016 22:26:35 GMT+0800 (China Standard Time)

@kzc your point being?

kzc · Answer 20 · Fri Apr 29 2016 22:31:40 GMT+0800 (China Standard Time)

Just adding weight to reverting is not an option.

kzc · Answer 21 · Fri Apr 29 2016 22:36:28 GMT+0800 (China Standard Time)

@saghul I will add that prior to the tty redirection fix, Mac stdout appeared to be blocking based on my observations with process.exit tests never truncating tty output on Mac. As of that tty redirect fix it is now non-blocking to make it on par with Linux behavior.

Ben Noordhuis · Answer 22 · Sat Apr 30 2016 00:00:34 GMT+0800 (China Standard Time)

Can I suggest closing out all other issues with a "discussion continues in #6456" comment?

James M Snell · Answer 23 · Sat Apr 30 2016 00:03:47 GMT+0800 (China Standard Time)

+1 ... we don't need multiple issues covering the same thing.

James M Snell · Answer 24 · Sat Apr 30 2016 01:57:03 GMT+0800 (China Standard Time)

See: #6477

Andrew Chalkley · Answer 25 · Sun May 01 2016 11:28:16 GMT+0800 (China Standard Time)

@eljefedelrodeodeljefe you got the wrong @ChALkeR :)

Benjamin E. Coe · Answer 26 · Wed May 11 2016 00:13:37 GMT+0800 (China Standard Time)

This is causing some fairly wonky behavior with yargs, two questions:

should we be classifying this as a bug (the new flushing behavior seems unintuitive), if so is there a separate tracking ticket I should be following?
is there a recommended workaround, or should I hold my horses for a patch.

Between commander, yargs, and optimist (all of which now exhibit broken behavior) this is going to be hitting a lot of people (about 1,600,000 installs a day).

kzc · Answer 27 · Wed May 11 2016 00:41:59 GMT+0800 (China Standard Time)

@bcoe You won't find consensus on the "process.exit() not flushing stdio" issue because many node devs don't think it's a problem.

If you must use process.exit() the only known workaround is have stdout and stderr block at application start:

process.stdout._handle.setBlocking(true);
process.stderr._handle.setBlocking(true);

Cue the "it's not supported" and "that's not the node way" rebuttals...

Robert Jefe Lindstädt · Answer 28 · Wed May 11 2016 01:10:45 GMT+0800 (China Standard Time)

@bcoe this is the tracking ticket. If possible don't do ._handle.setBlocking, since this will affect the whole process users start with yargs. There will definitely be no revert on the libuv side and this shouldn't be considered a bug. No-one has come up with a decent workaround, around flushing and properly unwinding on exit. I think it's gonna take a while.

There are workarounds however that would be immediately possible, but would require refactoring, namely avoiding the programmtic use of exit handlers. Where exactly in the code base is that a problem?

Benjamin E. Coe · Answer 29 · Wed May 11 2016 01:43:33 GMT+0800 (China Standard Time)

@eljefedelrodeodeljefe the problem is less with yargs itself, and more with the consuming library. With optimist, yargs, and (I would guess) commander, there are commands that force an exit preventing the program consuming the library from attempting to handle the parser output:

var argv = require('yargs')(['--help'])
  .help()
  .argv

console.log('we should never get here');

The above code would never hit the console.log line, and would process.exit(0);. Perhaps this would be an acceptable workaround?

if (shouldExit) {
process.stdout._handle.setBlocking(true);
process.stderr._handle.setBlocking(true);
console.log(yargs.help());
process.exit(0);
}

avoid setting stdout and stderr to blocking until we already know we are about to exit?

Robert Jefe Lindstädt · Answer 30 · Wed May 11 2016 01:52:30 GMT+0800 (China Standard Time)

dragging in @bnoordhuis. Have you worked on this in the meantime? Would that be an acceptable hotfix until we come up with a proper solution?

Sindre Sorhus · Answer 31 · Wed May 11 2016 01:59:54 GMT+0800 (China Standard Time)

Using process.exit() is common convention in CLI tools. The change in Node.js 6 has pretty much broken everything CLI related... I use process.exit() in meow which a lot of packages depend on (5,312,249 downloads in the last month).

process.exit() will be especially useful when ES2015 modules comes to Node.js, as we can then no longer return in the top-scope, so short-circuiting will be effectively impossible, without a nesting mess.

Robert Jefe Lindstädt · Answer 32 · Wed May 11 2016 02:03:38 GMT+0800 (China Standard Time)

Yeah, I know. However this really bad, sorry. It's something we need to live with now. Proper handling there would have been "returning from main" or use event emitters. The problems w/ synchronous and async behaviours are documented though.

Ah, the last point is interesting

Robert Jefe Lindstädt · Answer 33 · Wed May 11 2016 02:04:46 GMT+0800 (China Standard Time)

The solution probably will be to have a function forcing the flush. From a style point of view this whole exit handler business seems bad still though :(

Robert Jefe Lindstädt · Answer 34 · Wed May 11 2016 02:06:37 GMT+0800 (China Standard Time)

@Fishrock123 can you pick up @sindresorhus comment on not being able to return from top-scope in ES2015 modules. At least we'll probably need documentation about this.

kzc · Answer 35 · Wed May 11 2016 02:13:56 GMT+0800 (China Standard Time)

Perhaps this would be an acceptable workaround?

Only if you can guarantee that nothing else was output previously.

node 6.0.0 on Mac terminal:

$ node -e "console.log('The quick brown fox jumps.\n'.repeat(40000)); process.stdout._handle.setBlocking(true); console.log('Usage: ...'); process.exit(1);"
The quick brown fox jumps.
The quick brown fox jumps.
The quick brown fox jumps.
... 30 lines deleted ...
The quick brown fox jumps.
The quick brown fox jumps.
The quick brown $

kzc · Answer 36 · Fri May 13 2016 15:32:23 GMT+0800 (China Standard Time)

Patch to flush process.stdout and process.stderr upon process.exit() on unix:

https://github.com/kzc/node/commit/92fc9e0d992f043a4b92d9d286514328f5df1b6d

Tested successfully on Mac. Should work on Linux as well.

No attempt made at a Windows fix, but if one is needed it would follow the same idea in libuv. Not sure if this issue exists on Windows, as a few comments in the code suggest stdout/stderr blocks on that platform.

If someone wants to refine this patch and get it merged into node, go for it.

Robert Jefe Lindstädt · Answer 37 · Fri May 13 2016 20:12:56 GMT+0800 (China Standard Time)

I might have found something less intrusive: #6735 basically it's my favorite: a no-op :) Should work fine there though, passes tests and is backwards compatible.

Robert Jefe Lindstädt · Answer 38 · Fri May 13 2016 20:31:15 GMT+0800 (China Standard Time)

Scratch that. Need to refine...

Anna Henningsen · Answer 39 · Fri May 13 2016 21:35:51 GMT+0800 (China Standard Time)

I think @kzc’s suggestion is definitely worth pursuing, but I don’t know the situation on Windows either.

Jeremiah Senkpiel · Answer 40 · Fri May 13 2016 22:13:34 GMT+0800 (China Standard Time)

Whoops, only seeing this now. There are some things missing here, standby.

Jeremiah Senkpiel · Answer 41 · Fri May 13 2016 22:36:46 GMT+0800 (China Standard Time)

This is the original issue: #784

In it, @vkurchatkin found that this patch "fixes" the issue:

diff --git a/lib/net.js b/lib/net.js
index 030083d..efebd03 100644
--- a/lib/net.js
+++ b/lib/net.js
@@ -135,8 +135,7 @@ function Socket(options) {
     this._handle = createHandle(options.fd);
     this._handle.open(options.fd);
     if ((options.fd == 1 || options.fd == 2) &&
-        (this._handle instanceof Pipe) &&
-        process.platform === 'win32') {
+        (this._handle instanceof Pipe)) {
       // Make stdout and stderr blocking on Windows
       var err = this._handle.setBlocking(true);
       if (err)

There is also significant background in my attempted patch, using the above code: #1771

Namely this, by @bnoordhuis:

A bit of background: some years ago, I think it was in v0.7, it was decided to make stdout and stderr blocking. Turns out it doesn't work so well for pipes; ttys and files are usually very fast (local ones anyway) but pipes tend to fill up rapidly.

A number of people complained about it so we made stdio-to-pipe non-blocking again (except on Windows, where it's not supported.) I forgot the exact bug reports but the theme was that stdio was too slow; on OS X, the kernel pipe buffer is only about 4 kB, so it's easy to max out.

I believe the issue is now that people complain that output sometimes goes missing at program exit. Ideally, we'd have some way to tell libuv "flush only stdio writes, don't do other I/O" but that may not be straightforward to implement.

As an interim solution, this PR seems fine to me, although I can't predict if or how much it will break existing applications.

The last bit is why my patch didn't land. Smoke testing did not exist at the time, the patch is unideal and may break countless things downstream.

There is also links to @bnoordhuis's proposal to fix this in libuv: libuv/libuv#428, however it was decided that it is probably better that node handle this.

I don't seem to recall us ever finding where exactly it appeared though.

If memory serves right, and by looking at the results posted by @Qix- I think this is where things started to change: libuv/libuv@b197515 Because we started to open writable TTYs in non-blocking mode. Follow the commit trail for reasonin, reverting is not an option.

Parts of this issue should go back to pre-1.0.0 .. perhaps it was amplified recently but this sounds like a conflation of multiple issues now.

This is causing some fairly wonky behavior with yargs, two questions:

should we be classifying this as a bug (the new flushing behavior seems unintuitive), if so is there a separate tracking ticket I should be following?
is there a recommended workaround, or should I hold my horses for a patch.

@bcoe 1. Yes. 2. Avoid process.exit() to preserve chunked stdio writes.

The change in Node.js 6 has pretty much broken everything CLI related...

@sindresorhus This goes back to v1.0.0?

Again, sounds like multiple issues, or aplification of the existing one?

+1 for new function os.exit(Boolean) that drains stdout/stderr upon exit and leave process.exit() as is.

Strongly disagree. This is a bug that ought to be fixed.

My suggestion from "process: add process.exitSoon()" (#6477) is as follows:

Make process.exit() (or rather, void Exit()):

uv_stop() (I think) the event loop

or whatever to stop anything new from happening but while keeping the threads alive to do writes

attempt to flush any data

exit

At the same time, I don't think we should alter process.abort().

Edit: it is possible that this is out of scope for process.exit(), but even if we add something new (which should be the fallback), it should have that behavior.

I'm now pretty sure it is within scope, although I'm not sure how possible my idea is.

Note: I have not yet had time to look at @kzc's patch.

Saúl Ibarra Corretgé · Answer 42 · Fri May 13 2016 22:59:21 GMT+0800 (China Standard Time)

uv_stop() (I think) the event loop

or whatever to stop anything new from happening but while keeping the threads alive to do writes

Writes happen in the loop thread, there are no other threads doing the writes. So if the loop is stopped no data will be written.

Note: I have not yet had time to look at @kzc's patch.

The patch is basically what @bnoordhuis proposed but at the handle level instead of a single global function.

Jeremiah Senkpiel · Answer 43 · Fri May 13 2016 23:03:53 GMT+0800 (China Standard Time)

Writes happen in the loop thread, there are no other threads doing the writes. So if the loop is stopped no data will be written.

Hmmm. What I mean by that may be more useful then: Shut down as much as possible so no other JS code runs.

Saúl Ibarra Corretgé · Answer 44 · Fri May 13 2016 23:05:46 GMT+0800 (China Standard Time)

I see. So some uv_walk + uv_close all handles except the ones in use for stdio + one last uv_run then.

Robert Jefe Lindstädt · Answer 45 · Fri May 13 2016 23:16:57 GMT+0800 (China Standard Time)

I was thinking about this too. Seems legit. Is it not possible to do this in streamwrap, too? Attach process.exit to the last write of the stream that is currently happening?

James M Snell · Answer 46 · Sat May 14 2016 00:01:55 GMT+0800 (China Standard Time)

@saghul ... I would describe it slightly differently: What we need is essentially a uv_graceful_stop() that:

Puts the loop into a 'stopping' mode that disallows any new requests on handles,
Closes all handles that do not have existing pending requests,
Allows existing pending requests on handles to complete,
Closes the remaining handles when all requests are complete,
Optionally creates a timer to force handles to close / requests to cancel if requests take too long to complete.

Jeremiah Senkpiel · Answer 47 · Sat May 14 2016 00:16:45 GMT+0800 (China Standard Time)

Allows existing pending requests on handles to complete,

Closes the remaining handles when all requests are complete,

I'm not sure these are within scope, if you're calling process.exit() you are telling the process to ignore other connections and shut down?

Optionally creates a timer to force handles to close / requests to cancel if requests take too long to complete.

That's definitely going somewhere beyond this imo

James M Snell · Answer 48 · Sat May 14 2016 00:28:37 GMT+0800 (China Standard Time)

I respectfully disagree. Allowing the existing requests to complete is what this discussion is about, yes? For me the requirement here is to have a graceful exit option. There are times when what you want is to shutdown immediately without completing the pending tasks and there are times when what you want is to shutdown cleanly with pending tasks completed or given a chance to clean up. Personally I do not want (nor do I believe it is necessary) to change the existing behavior of process.exit(). What I want is the ability to simply say, "Hey, we're shutting things down now, please finish what you're doing".

Jeremiah Senkpiel · Answer 49 · Sat May 14 2016 00:31:22 GMT+0800 (China Standard Time)

Allowing the existing requests to complete is what this discussion is about, yes?

This is about having stdio finish chunked writes, mostly.

There are times when what you want is to shutdown immediately without completing the pending tasks and there are times when what you want is to shutdown cleanly with pending tasks completed or given a chance to clean up.

I don't disagree, but that isn't process.exit()'s worry.

Personally I do not want (nor do I believe it is necessary) to change the existing behavior of process.exit(). What I want is the ability to simply say, "Hey, we're shutting things down now, please finish what you're doing".

Sure, but this is actually a bug. This used to work, why it doesn't currently is complex and awkward, but we should still fix it. (Also it violates users expectations far beyond just exiting before other connections.)

Saúl Ibarra Corretgé · Answer 50 · Sat May 14 2016 00:42:43 GMT+0800 (China Standard Time)

What we need is essentially a uv_graceful_stop() that:

I'm not sure how usefult this is for the general public, seems very Node specific, but let's see:

Puts the loop into a 'stopping' mode that disallows any new requests on handles,

Doable with a flag on the loop.

Closes all handles that do not have existing pending requests,

Not all handles have requests associated with them, and there are also standalone requests. If we close all handles which don't have requests we'd also close the idle and check handles used for process.nextTick, it would turn things into a royal mess.

Allows existing pending requests on handles to complete,

Sure.

Closes the remaining handles when all requests are complete,

We'd need something new here, since currently, if you close a handle with pending requests, they are cancelled (if possible).

Optionally creates a timer to force handles to close / requests to cancel if requests take too long to complete.

Not all requests are cancellable.

Overall I think this an overkill approach for the problem at hand and I don't see it happening any time soon (unless someone wants to volunteer the time to come up with a thorough design proposal and implementation). This is about flushing stdio streams on exit, which a relatively specific task IMHO.

kzc · Answer 51 · Sat May 14 2016 00:46:08 GMT+0800 (China Standard Time)

This is about having stdio finish chunked writes, mostly.

First the pending uv writes that node no longer has control over, then the chunked node writes.

This used to work

It never truly worked on UNIX, just the write thresholds were somewhat higher. As of node 6.x Mac buffers 1024 bytes of writes at the OS level and Linux buffers 64KB of writes. Non blocking writes will always succeed up until this threshold.

FWIW, the patch https://github.com/kzc/node/commit/92fc9e0d992f043a4b92d9d286514328f5df1b6d completely flushes stdio uv writes and node stream chunked writes at process.exit(). I think this behavior is expected by node users.

What we need is essentially a uv_graceful_stop()

I'd be in favor of that as long as no callbacks are invoked by libuv during such a blocking call so that no more work can be scheduled in user land. And only pending writes to stdio and files are flushed. The socket stream data can be in flight at the OS level anyway and no attempt should be made to flush them in my opinion.

Jeremiah Senkpiel · Answer 52 · Sat May 14 2016 00:48:37 GMT+0800 (China Standard Time)

It never truly worked on UNIX

Sometime prior to v1.0.0, it actually did, as far as I can tell. Again, conflating the original issue, with the amplification of it. :)

kzc · Answer 53 · Sat May 14 2016 00:51:03 GMT+0800 (China Standard Time)

Prior to v1.0.0, it actually did

Before my time and a lot of other node developers.

Instead of looking at history I think we should just address the problems as they stand now.

Jeremiah Senkpiel · Answer 54 · Sat May 14 2016 00:52:56 GMT+0800 (China Standard Time)

Instead of looking at history I think we should just address the problems as they stand now.

It's the same problem.

Saúl Ibarra Corretgé · Answer 55 · Sat May 14 2016 00:55:34 GMT+0800 (China Standard Time)

Instead of looking at history I think we should just address the problems as they stand now.

Looking at history is important not to make the same mistake twice. There is tons of knowledge which is unfortunately not very accessible, in the form of commit logs, but one has to know what they're looking for.

Your approach could work, but the libuv bits will need polishing and Windows support.

kzc · Answer 56 · Sat May 14 2016 01:01:32 GMT+0800 (China Standard Time)

Looking at history is important not to make the same mistake twice. There is tons of knowledge which is unfortunately not very accessible, in the form of commit logs, but one has to know what they're looking for.

Fair enough.

Your approach could work, but the libuv bits will need polishing and Windows support.

I'm the first to admit my libuv change was a hack. It's a working proof of concept. I was going to code up something to walk the event queue and flush pending write events when I discovered that uv__write already did exactly what I needed.

For the record, does Windows as of node 6.x exhibit this stdio not flushing upon process.exit() problem? The expected fail test case for this issue was not run on Windows suggesting to me that it might have actually worked. My patch moved that expected fail test to be a normal (passing) test.

Edit: the renamed test is here:

https://github.com/kzc/node/blob/92fc9e0d992f043a4b92d9d286514328f5df1b6d/test/parallel/test-stdout-buffer-flush-on-exit.js

Benjamin E. Coe · Answer 57 · Sat May 14 2016 01:07:41 GMT+0800 (China Standard Time)

I think this is more of a special case @jasnell. If I call process.exit() it's my expectation that the process will exit within a tick; if there's an outbound HTTP request, as an example, I wouldn't want to wait for this to complete prior to killing the process.

However, flushing stdio is an exception:

folks writing CLI code have been writing it with the expectation that stdio flushes within one tick, if the process chooses to exit early using process.exit() (CC: @sindresorhus)
contrary to an outbound HTTP request, we know that flushing stdout, stderr, etc., should flush in a finite amount of time.

Carving out an exception for stdio keeps the platform's behavior with what the community is accustomed to.

Sam Roberts · Answer 58 · Sat May 14 2016 05:25:54 GMT+0800 (China Standard Time)

The addition of process.exitCode made this much better, the correct way to exit a node app now that has pending I/O (it called console.log) is to set exit code, and to close all handles, timers, etc. I.e., to do a graceful exit. This can be a bit annoying, in that all your resources need to be tracked and closeable, but they should be, anyhow, if you want to exit gracefully!

This works well for us... except for node v0.10, which has no process.exitCode.... on v0.10 you need to emulate it by setting a global code, then doing the graceful resource cleanup... then in the on exit handler call process.exit(code)... which is pretty ugly. I wish we could backport process.exitCode to v0.10.

Part of the problem here is that process.exit() just seems to be a reasonable way to exit node... when it isn't really, its a "terminate with prejudice" directive. This is compounded by the fact that on Unix only (not Windows), console.log used to have a special exemption to make it blocking if and only if it was a terminal... not if it was piped. So people learnt to rely on this quirk.

Sam Roberts · Answer 59 · Sat May 14 2016 05:27:52 GMT+0800 (China Standard Time)

Oh, and CLI parsers that are calling console.log and then calling process.exit... thats just doing it wrong. They need to call log and throw an exception, and let the caller catch it and clean up gracefully before exit.

Benjamin E. Coe · Answer 60 · Sat May 14 2016 05:51:24 GMT+0800 (China Standard Time)

@sam-github this is a pattern that has been used by almost every CLI application in the ecosystem since the beginning of time, to terminate execution before it applies to the libraries' consumer -- the alternative would be literally 10000+ consuming libraries changing their contract with commander, yargs, optimist, meow, tap, the list goes on.

isaacs · Answer 61 · Sat May 14 2016 05:58:24 GMT+0800 (China Standard Time)

Saying "you should just do a graceful exit always" is not a valuable answer in this case, in the sense that it does not meaningfully move us towards a solution state. (Unless, of course, it's followed up with several dozen (hundred?) pull requests to refactor the many impacted programs in the suggested manner.)

The fact is that the contract changed, pretty dramatically. Maybe not the intended or documented contract, but the actual contract (as in, the way that node actually works) changed very dramatically here.

It's fine to point the finger at the userbase and say "Welp, they were doing it wrong, too bad", but the platform is mature and that kind of casual breakage is foolish. Many extant programs won't work on Node 6 because of this. Who's right or wrong hardly matters if users can't use your platform, and complaining about an un-boiled ocean doesn't solve the problem.

What's the sense in arguing when you're all alone?

Robert Jefe Lindstädt · Answer 62 · Sat May 14 2016 06:01:37 GMT+0800 (China Standard Time)

I think both positions are arguable and can be combined in a fix in node. But I also think use of process.exit() was always and I cannot stress enough always wrong and should be slowly discouraged. By the way so was process.reallyExit so it's just historic...very historic. Looking at the repos in question fixes for gracefully exiting seem quite easy though.

Benjamin E. Coe · Answer 63 · Sat May 14 2016 06:04:10 GMT+0800 (China Standard Time)

@eljefedelrodeodeljefe I'm comfortable with this, I'm fine with pushing people towards the new behavior with yargs@5.x, however there are 13,000 consuming libraries on prior versions and I don't want to break them.

isaacs · Answer 64 · Sat May 14 2016 06:04:52 GMT+0800 (China Standard Time)

@eljefedelrodeodeljefe Up until relatively recently, there was literally no other way to set the exit code of a node process other than calling process.exit(), and it's been safe to assume that stdio streams would synchronously flush on exit for a very long time.

It's fine and good to complain about what's right or wrong. Ok, it was wrong to call process.exit, always and forever, we should feel shame for this. So we feel shame. Who cares? Node 6 broke a very significant number of node programs with this change in behavior.

So do we care about that breakage or not? It seems to me that turning our back on tens of thousands of broken programs is a choice that should not be taken lightly!

kzc · Answer 65 · Sat May 14 2016 06:11:23 GMT+0800 (China Standard Time)

Updated process.exit fix: ~~https://github.com/kzc/node/commit/a73ec2f007020aed2837800afc2169ace11d4c02~~ https://github.com/kzc/node/commit/29997921800e00a22d9f92d24704a0021be03bbf

Robert Jefe Lindstädt · Answer 66 · Sat May 14 2016 06:11:35 GMT+0800 (China Standard Time)

@bcoe @isaacs sure, I care deeply fixing it - especially because the use is so widespread and clis being so crucial for the ecosystem. I didn't see a decent fix just yet :( My point is just that it should be a combined effort. If you see ways of mitigating this in future versions of the repos please go ahead immediately and node should fix everything backwards.

Funny enough I was suggesting using the EE instead of exit handlers as main control flow of a CLI some months ago but didn't see at the time that it's gonna explode that much.

Let's come up with good ideas over thee weekend! :)

kzc · Answer 67 · Sat May 14 2016 06:22:53 GMT+0800 (China Standard Time)

@eljefedelrodeodeljefe Not sure what you're looking for in a fix that isn't some variation of the simple patch I proposed. stdio libuv queued writes and queued node chunked writes have to be flushed in a sync manner prior to exit(). That's all.

Jeremiah Senkpiel · Answer 68 · Sat May 14 2016 08:00:07 GMT+0800 (China Standard Time)

and it's been safe to assume that stdio streams would synchronously flush on exit for a very long time.

Node 6 broke a very significant number of node programs with this change in behavior.

Can someone point me to exactly how the behavior changed in v6? As far as I am aware, it is only easier to trigger, and that it is indeed the issue I mentioned previously. (I.e. writes become chunked at a smaller size) Is that true? (@saghul?)

Josh Junon · Answer 69 · Sat May 14 2016 08:18:21 GMT+0800 (China Standard Time)

@kzc submit it as a PR here.

kzc · Answer 70 · Sat May 14 2016 11:28:37 GMT+0800 (China Standard Time)

@Qix- I just put together the proof of concept to show this issue could be solved in a straightforward manner. I'm hoping some else would run with it and do the Windows piece - assuming it is required. Because the patch touches libuv it might be better for those folks to do that part. Presently uv_flush_sync, a.k.a. uv__write, does not return any error code. Inspecting such an error code would be wise before flushing the node-side stream chunks otherwise there could be a gap in the data. Or perhaps the libuv team would prefer to implement some variation of @jasnell's uv_graceful_stop() proposal. Either would be fine with me.

kzc · Answer 71 · Sat May 14 2016 11:54:56 GMT+0800 (China Standard Time)

@saghul Is there a libuv function that provides the number of bytes pending to be written for a uv_stream_t*? That info would be sufficient to know whether the flushing of the node-side chunks can proceed - i.e., when libuv bytes pending to written for the stream is 0. If this function does not already exist it would be useful if uv__write() were to return that value.

Saúl Ibarra Corretgé · Answer 72 · Sat May 14 2016 17:16:03 GMT+0800 (China Standard Time)

Can someone point me to exactly how the behavior changed in v6? As far as I am aware, it is only easier to trigger,

That's my understanding. While we were doing async writes before 1.0, it's possible that the pty fix which landed in 1.9.0 also fixed it on OSX, thus making them really async.

I think we haven't understood 100% where the problem is. We do see its effect. So, it would be interesting to take Node 5 and see if stdout writes block or not on OSX. Then the same with Node 6 (which do not block).

Saúl Ibarra Corretgé · Answer 73 · Sat May 14 2016 17:17:47 GMT+0800 (China Standard Time)

Is there a libuv function that provides the number of bytes pending to be written for a uv_stream_t*?

@kzc see: http://docs.libuv.org/en/v1.x/stream.html#c.uv_stream_t.write_queue_size

Jeremiah Senkpiel · Answer 74 · Sat May 14 2016 22:54:15 GMT+0800 (China Standard Time)

it's possible that the pty fix which landed in 1.9.0 also fixed it on OSX, thus making them really async.

I don't think so, if you run https://github.com/nodejs/node/blob/master/test/known_issues/test-stdout-buffer-flush-on-exit.js on an OS X machine prior to that it also happens as expected.

I think we haven't understood 100% where the problem is. We do see its effect.

My understanding at the time when I investigated it with bnoordhuis was that flushing wasn't happening at the OS level, I think?

So, it would be interesting to take Node 5 and see if stdout writes block or not on OSX. Then the same with Node 6 (which do not block).

Hmmmm, I'm quite certain it did not block but I don't have proof of that, or at least not off-hand.

kzc · Answer 75 · Sun May 15 2016 00:32:46 GMT+0800 (China Standard Time)

Revised process.exit fix with improved error checking: https://github.com/kzc/node/commit/29997921800e00a22d9f92d24704a0021be03bbf

kzc · Answer 76 · Sun May 15 2016 01:45:59 GMT+0800 (China Standard Time)

@Fishrock123 The patch https://github.com/kzc/node/commit/29997921800e00a22d9f92d24704a0021be03bbf renames the test you mentioned to test/parallel/test-stdout-buffer-flush-on-exit.js and runs it successfully on Mac and Linux.

Benjamin E. Coe · Answer 77 · Sun May 15 2016 02:43:41 GMT+0800 (China Standard Time)

for the public record, here's the workaround that I'm about to land for yargs (commander should be able to use a similar approach):

yargs/yargs#501

Jeremiah Senkpiel · Answer 78 · Sun May 15 2016 05:40:14 GMT+0800 (China Standard Time)

I did some more digging for The yargs issue. Here's some old discussion of "Fix blocking / non-blocking stdio woes": nodejs/node-v0.x-archive#3584

Currently (At the time of linked issue) process.stdin / stdout / stderr is blocking, except when it is a pipe on windows. Weird and surprising. Very unpractical in cases where stdio is used as an IPC mechanism between node processes.

Also, net.Socket#_handle.setBlocking() appears to have been added in 20176a9 (v0.11.2)

Benjamin E. Coe · Answer 79 · Sun May 15 2016 05:51:38 GMT+0800 (China Standard Time)

created a shim here for anyone else running into this issue.

kzc · Answer 80 · Sun May 15 2016 06:45:20 GMT+0800 (China Standard Time)

@bcoe Be aware that calling setBlocking(true) is not a cure all. If a large write takes place before setBlocking(true) it does not work.

See: #6456 (comment)

That's why the process.exit() issue should be addressed in node itself and back ported.

Saúl Ibarra Corretgé · Answer 81 · Sun May 15 2016 18:23:23 GMT+0800 (China Standard Time)

Is test-stdout-buffer-flush-on-exit.js supposed to be reliable? It fails for me on Node 4, 5 and 6 on OSX and on Node 5 and 6 on Linux (haven't tested 4 there).

Saúl Ibarra Corretgé · Answer 82 · Sun May 15 2016 18:44:35 GMT+0800 (China Standard Time)

Is there a reliable test we can run with git bisect to try to understand where the problem originates? This one also fails for me with Node 4 on Linux. I'm really confused now. :-S

kzc · Answer 83 · Sun May 15 2016 21:48:09 GMT+0800 (China Standard Time)

@saghul test-stdout-buffer-flush-on-exit.js is an expected-fail on node 4, 5 and 6 (without my patch).

Here's another program that fails to run as expected on node 4, 5 and 6 in a unix terminal:

// this program populates the libuv write queue upon first write over 64K
// then will populate the node stream chunk queue for subsequent writes.
for (var i = 1; i <= 1000; ++i) {
  process.stdout.write((i + 
    ': The quick brown fox jumps over the lazy dog.\n').repeat(1500));
}
process.exit(1);

It will run successfully on Mac and Linux with https://github.com/kzc/node/commit/29997921800e00a22d9f92d24704a0021be03bbf

Saúl Ibarra Corretgé · Answer 84 · Sun May 15 2016 21:54:13 GMT+0800 (China Standard Time)

@kzc I know your patch will make it work. What I want to know is what and where changed subtly between Node 4 and Node 6, since the core principle (writes being async) remained.

IOW, we need a test which passes in 4 (and maybe 5) but doesn't in 6.

kzc · Answer 85 · Sun May 15 2016 22:16:38 GMT+0800 (China Standard Time)

The behavior only noticeably changed on Mac with node 6.0.0/libuv 1.9.0 at the tty: #6456 (comment)

Do not pipe or redirect the output as it changes the test and a different code path is taken in node. test-stdout-buffer-flush-on-exit.js is a pipe test, not a tty test.

It appears as if the fd is blocking at the tty in node 4 and 5 on Mac, but non-blocking on node 6 at the tty.

Linux behavior is the same in node 4, 5 and 6 - 64K is output upon process.exit() at tty before truncation.

kzc · Answer 86 · Sun May 15 2016 22:29:17 GMT+0800 (China Standard Time)

But even when stdio is piped, node 4, 5 and 6 never completely flushed stdio upon process.exit(). I believe these stdio streams ought to be completely flushed regardless of being run at the tty or piped or redirected to a file.

Jeremiah Senkpiel · Answer 87 · Sun May 15 2016 22:29:20 GMT+0800 (China Standard Time)

@saghul known_issues tests fail if the bug exists. Contrary to what @kzc says, this bug exists since Node.js (io.js) v1.0.0. It does not appear to exist in v0.12 (.x) or before. The recent libuv 1.9.0 patch appears to have made it so that chunked writes are triggered at much smaller buffer(?) sizes, meaning the bug appears far more easily.

It appears as if the fd is blocking at the tty in node 4 and 5 on Mac, but non-blocking on 6 at the tty.

@kzc This isn't correct as far as my investigations have gone (see above..), but when chunked only the first chunk is flushed on (fast) exit.

kzc · Answer 88 · Sun May 15 2016 22:32:53 GMT+0800 (China Standard Time)

@Fishrock123 I'm aware of what known_issues is for. I did not mention node v1.0.0, nor v0.12.

tty behavior is different than piped behavior.

My patch fixes flushing of stdio at the tty or when piped upon process.exit().

Jeremiah Senkpiel · Answer 89 · Sun May 15 2016 22:35:52 GMT+0800 (China Standard Time)

tty behavior is different than piped behavior.

Ah, right. Perhaps that has changed. I have not taken a look.

kzc · Answer 90 · Sun May 15 2016 22:39:58 GMT+0800 (China Standard Time)