nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨

Home Page:https://nodejs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

stdio buffered writes (chunked) issues & process.exit() truncation

eljefedelrodeodeljefe opened this issue · comments

If this is currently breaking your program, please use this temporary fix:

[process.stdout, process.stderr].forEach((s) => {
  s && s.isTTY && s._handle && s._handle.setBlocking &&
    s._handle.setBlocking(true)
})

  • Version: v6, (likely all and backportable)
  • Platform: all
  • Subsystem: process

As noted in #6297 async stdio will not be flushed upon immediate process.exit(). This may lay open general deficiencies around C exit() from C++ functions not being properly unwound and is probably not just introduced by latest libuv updates. It should be considered to add flushing, providing graceful exit and/or improving unwinding C++ stacks.

cc @jasnell, @kzc, @Qix-, @bnoordhuis

Issues

Discussion has been already taking place at several places, e.g. #6297, #6456, #6379

Summaries of Proposals

proposals are not exclusive and could lead to semantically unrelated contributions.

  • aid with process.stdout.flush()
  • process.setBlocking(true)
  • node --blocking-stdio
  • longjmp() towards main at exit in C++
  • move parts of process.exit() / process.reallyExit() to new method os.exit()
  • golang panic()- or c++ throw-like stack unwinding

Discussions by Author (with content)


@ChALkeR
I tried to discuss this some time ago at IRC, but postponed it for quite a long time. Also I started the discussion of this in #1741, but I would like to extract the more specific discussion to a separate issue.

I could miss some details, but will try to give a quick overview here.

Several issues here:

  1. Many calls to console.log (e.g. calling it in a loop) could chew up all the memory and die — #1741, #2970#3171.
  2. console.log has different behavior while printing to a terminal and being redirected to a file. — #1741 (comment).
  3. Output is sometimes truncated — #6297, there were other ones as far as I remember.
  4. The behaviour seems to differ across platforms.

As I understand it — the output has an implicit write buffer (as it's non-blocking) of unlimited size.

One approach to fixing this would be to:

  1. Introduce an explicit cyclic write buffer.
  2. Make writes to that cyclic buffer blocking.
  3. Make writes from the buffer to the actual output non blocking.
  4. When the cyclic buffer reaches it's maximum size (e.g. 10 MiB) — block further writes to the buffer until a corresponding part of it is freed.
  5. On (normal) exit, make sure the buffer is flushed.

For almost all cases, except for the ones that are currently broken, this would behave as a non-blocking buffer (because writes to the buffer are considerably faster than writes from the buffer to file/terminal).

For cases when the data is being piped to the output too quickly and when the output file/terminal does not manage to output it at the same rate — the write would turn into a blocking operation. It would also be blocking at the exit until all the data is written.

Another approach would be to monitor (and limit) the size of data that is contained in the implicit buffer coming from the async queue, and make the operations block when that limit is reached.

Perhaps a list of issues this would address and/or close would be helpful to include since this seems to be a sprawling issue with a lot of fragmented discussion.

Yes, just a little late in Europe :( keep 'em coming and I add them above.

Also see #6410

Considering all the clarification in the #6410, is there also a theoretical possibility that not only several I/O calls to stdout could not make it, but even one simple console.log() before process.exit() could be truncated or discarded?

@vsemozhetbyt that is especially correct if I'm understanding your question correctly.

To reproduce you can do

require('crypto').randomBytes(100000000, function(err, buffer) {
  var token = buffer.toString('hex');
  console.log(token);
  process.exit(0)
});

Edit: @addaleax's hint: test does a similar thing. Sorry @addaleax

commented

@vsemozhetbyt This output is truncated with node 6.0.0 on Mac after approx 40 lines:

node -e 'console.log("The quick brown fox jumps.\n".repeat(40000)); process.exit(7);'

node 5.x and earlier output all 40000 lines on Mac.

So now if user does not want to reflow the code all one has is to write something like

const err = {name: 'Error', message: 'something wrong'};
throw err;

instead of

console.log('Error: something wrong');
process.exit(1);

and to deal with all the uncontrolled clutter of debug output?

commented

throw err;
and to deal with all the uncontrolled clutter of debug output?

For dev code, sure, but uncaught exceptions in production code is not very elegant or professional.

commented

Related: #6379

Also discusses process.stdio.setBlocking(Boolean)

added @chalkers thread and updated this issue with some summaries and stuff.

@kzc

node 5.x and earlier output all 40000 lines on Mac.

Not sure what you're talking about.

#!/usr/bin/env bash
. ~/.nvm/nvm.sh

uname -a
echo

function do_buffer_test {
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n"));' | wc -l
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n")); process.exit(1)' | wc -l
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n")); process.reallyExit(1)' | wc -l
    node <<< 'console.log((new Array(40000)).join("Hello! this is a test!\n")); process.abort()' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!");' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!"); process.exit(1)' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!"); process.reallyExit(1)' | wc -l
    node <<< 'for (var i = 0; i < 40000; i++) console.log("Hello! this is a test!"); process.abort()' | wc -l
}

nvm install 0.10
do_buffer_test

nvm install 0.12
do_buffer_test

nvm install 1
do_buffer_test

nvm install 2
do_buffer_test

nvm install 3
do_buffer_test

nvm install 4
do_buffer_test

nvm install 5
do_buffer_test

nvm install 6
do_buffer_test
$ ./test-buffers.sh
Darwin JunonBox.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64

v0.10.44 is already installed.
Now using node v0.10.44 (npm v2.15.0)
   40000
   40000
   40000
   40000
   40000
   40000
   40000
   40000
v0.12.13 is already installed.
Now using node v0.12.13 (npm v2.15.0)
   40000
   40000
   40000
   40000
   40000
   40000
   40000
   40000
iojs-v1.8.4 is already installed.
Now using io.js v1.8.4 (npm v2.9.0)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
iojs-v2.5.0 is already installed.
Now using io.js v2.5.0 (npm v2.13.2)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
iojs-v3.3.1 is already installed.
Now using io.js v3.3.1 (npm v2.14.3)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
v4.4.3 is already installed.
Now using node v4.4.3 (npm v2.15.1)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
v5.11.0 is already installed.
Now using node v5.11.0 (npm v3.8.6)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
v6.0.0 is already installed.
Now using node v6.0.0 (npm v3.8.6)
   40000
    2849
    2849
    2849
   40000
   40000
   40000
   40000
$ ./test-buffers.sh
Linux -snip- 3.18.27 #1 SMP Wed Feb 17 01:14:23 UTC 2016 x86_64 GNU/Linux

######################################################################## 100.0%
Now using node v0.10.44 (npm v2.15.0)
Creating default alias: default -> 0.10 (-> v0.10.44)
40000
40000
40000
40000
40000
40000
40000
40000
######################################################################## 100.0%
Now using node v0.12.13 (npm v2.15.0)
40000
40000
40000
40000
40000
40000
40000
40000
Downloading https://iojs.org/dist/v1.8.4/iojs-v1.8.4-linux-x64.tar.gz...
######################################################################## 100.0%
Now using io.js v1.8.4 (npm v2.9.0)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://iojs.org/dist/v2.5.0/iojs-v2.5.0-linux-x64.tar.xz...
######################################################################## 100.0%
Now using io.js v2.5.0 (npm v2.13.2)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://iojs.org/dist/v3.3.1/iojs-v3.3.1-linux-x64.tar.xz...
######################################################################## 100.0%
Now using io.js v3.3.1 (npm v2.14.3)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://nodejs.org/dist/v4.4.3/node-v4.4.3-linux-x64.tar.xz...
######################################################################## 100.0%
Now using node v4.4.3 (npm v2.15.1)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://nodejs.org/dist/v5.11.0/node-v5.11.0-linux-x64.tar.xz...
######################################################################## 100.0%
Now using node v5.11.0 (npm v3.8.6)
40000
2849
2849
2849
40000
40000
40000
40000
Downloading https://nodejs.org/dist/v6.0.0/node-v6.0.0-linux-x64.tar.xz...
######################################################################## 100.0%
Now using node v6.0.0 (npm v3.8.6)
40000
2849
2849
2849
40000
40000
40000
40000

Looks to me whenever io.js forked is when this started happening. Perhaps @indutny can shed some light on the subject.

Adding two other possibilities.

  • move parts of process.exit() / process.reallyExit() to new method os.exit(), which may fit what actually is happening better
  • golang panic()- or c++ throw-like stack unwinding.
commented

@Qix- @eljefedelrodeodeljefe Please understand that when you pipe the results you are changing the test. It follows a different code path in node and libuv. You have to observe it on the terminal. So in that regard, the test is more difficult to automate.

I am observing this behavior of Mac OS X 10.9.5. The behavior is different on Mac and Linux. Mac stdout appears to have had blocking writes to the tty historically. See #6297 (comment)

commented

+1 for new function os.exit(Boolean) that drains stdout/stderr upon exit and leave process.exit() as is.

Actually may have to leave process.exit as is because of the prevalence of workarounds to the stdout flushing problem such as node-exit which might break if the behavior of process.exit changes.

If memory serves right, and by looking at the results posted by @Qix- I think this is where things started to change: libuv/libuv@b197515 Because we started to open writable TTYs in non-blocking mode. Follow the commit trail for reasonin, reverting is not an option.

commented

reverting is not an option.

Yes, and that particular commit also introduced the tty redirection bug in src/unix/tty.c that was fixed in libuv 1.9.0.

@kzc your point being?

commented

Just adding weight to reverting is not an option.

commented

@saghul I will add that prior to the tty redirection fix, Mac stdout appeared to be blocking based on my observations with process.exit tests never truncating tty output on Mac. As of that tty redirect fix it is now non-blocking to make it on par with Linux behavior.

Can I suggest closing out all other issues with a "discussion continues in #6456" comment?

+1 ... we don't need multiple issues covering the same thing.

This is causing some fairly wonky behavior with yargs, two questions:

  1. should we be classifying this as a bug (the new flushing behavior seems unintuitive), if so is there a separate tracking ticket I should be following?
  2. is there a recommended workaround, or should I hold my horses for a patch.

Between commander, yargs, and optimist (all of which now exhibit broken behavior) this is going to be hitting a lot of people (about 1,600,000 installs a day).

commented

@bcoe You won't find consensus on the "process.exit() not flushing stdio" issue because many node devs don't think it's a problem.

If you must use process.exit() the only known workaround is have stdout and stderr block at application start:

process.stdout._handle.setBlocking(true);
process.stderr._handle.setBlocking(true);

Cue the "it's not supported" and "that's not the node way" rebuttals...

@bcoe this is the tracking ticket. If possible don't do ._handle.setBlocking, since this will affect the whole process users start with yargs. There will definitely be no revert on the libuv side and this shouldn't be considered a bug. No-one has come up with a decent workaround, around flushing and properly unwinding on exit. I think it's gonna take a while.

There are workarounds however that would be immediately possible, but would require refactoring, namely avoiding the programmtic use of exit handlers. Where exactly in the code base is that a problem?

@eljefedelrodeodeljefe the problem is less with yargs itself, and more with the consuming library. With optimist, yargs, and (I would guess) commander, there are commands that force an exit preventing the program consuming the library from attempting to handle the parser output:

var argv = require('yargs')(['--help'])
  .help()
  .argv

console.log('we should never get here');

The above code would never hit the console.log line, and would process.exit(0);. Perhaps this would be an acceptable workaround?

if (shouldExit) {
process.stdout._handle.setBlocking(true);
process.stderr._handle.setBlocking(true);
console.log(yargs.help());
process.exit(0);
}

avoid setting stdout and stderr to blocking until we already know we are about to exit?

dragging in @bnoordhuis. Have you worked on this in the meantime? Would that be an acceptable hotfix until we come up with a proper solution?

Using process.exit() is common convention in CLI tools. The change in Node.js 6 has pretty much broken everything CLI related... I use process.exit() in meow which a lot of packages depend on (5,312,249 downloads in the last month).

process.exit() will be especially useful when ES2015 modules comes to Node.js, as we can then no longer return in the top-scope, so short-circuiting will be effectively impossible, without a nesting mess.

Yeah, I know. However this really bad, sorry. It's something we need to live with now. Proper handling there would have been "returning from main" or use event emitters. The problems w/ synchronous and async behaviours are documented though.

Ah, the last point is interesting

The solution probably will be to have a function forcing the flush. From a style point of view this whole exit handler business seems bad still though :(

@Fishrock123 can you pick up @sindresorhus comment on not being able to return from top-scope in ES2015 modules. At least we'll probably need documentation about this.

commented

Perhaps this would be an acceptable workaround?

Only if you can guarantee that nothing else was output previously.

node 6.0.0 on Mac terminal:

$ node -e "console.log('The quick brown fox jumps.\n'.repeat(40000)); process.stdout._handle.setBlocking(true); console.log('Usage: ...'); process.exit(1);"
The quick brown fox jumps.
The quick brown fox jumps.
The quick brown fox jumps.
... 30 lines deleted ...
The quick brown fox jumps.
The quick brown fox jumps.
The quick brown $ 
commented

Patch to flush process.stdout and process.stderr upon process.exit() on unix:

https://github.com/kzc/node/commit/92fc9e0d992f043a4b92d9d286514328f5df1b6d

Tested successfully on Mac. Should work on Linux as well.

No attempt made at a Windows fix, but if one is needed it would follow the same idea in libuv. Not sure if this issue exists on Windows, as a few comments in the code suggest stdout/stderr blocks on that platform.

If someone wants to refine this patch and get it merged into node, go for it.

I might have found something less intrusive: #6735 basically it's my favorite: a no-op :) Should work fine there though, passes tests and is backwards compatible.

Scratch that. Need to refine...

I think @kzc’s suggestion is definitely worth pursuing, but I don’t know the situation on Windows either.

Whoops, only seeing this now. There are some things missing here, standby.

This is the original issue: #784

In it, @vkurchatkin found that this patch "fixes" the issue:

diff --git a/lib/net.js b/lib/net.js
index 030083d..efebd03 100644
--- a/lib/net.js
+++ b/lib/net.js
@@ -135,8 +135,7 @@ function Socket(options) {
     this._handle = createHandle(options.fd);
     this._handle.open(options.fd);
     if ((options.fd == 1 || options.fd == 2) &&
-        (this._handle instanceof Pipe) &&
-        process.platform === 'win32') {
+        (this._handle instanceof Pipe)) {
       // Make stdout and stderr blocking on Windows
       var err = this._handle.setBlocking(true);
       if (err)

There is also significant background in my attempted patch, using the above code: #1771

Namely this, by @bnoordhuis:

A bit of background: some years ago, I think it was in v0.7, it was decided to make stdout and stderr blocking. Turns out it doesn't work so well for pipes; ttys and files are usually very fast (local ones anyway) but pipes tend to fill up rapidly.

A number of people complained about it so we made stdio-to-pipe non-blocking again (except on Windows, where it's not supported.) I forgot the exact bug reports but the theme was that stdio was too slow; on OS X, the kernel pipe buffer is only about 4 kB, so it's easy to max out.

I believe the issue is now that people complain that output sometimes goes missing at program exit. Ideally, we'd have some way to tell libuv "flush only stdio writes, don't do other I/O" but that may not be straightforward to implement.

As an interim solution, this PR seems fine to me, although I can't predict if or how much it will break existing applications.

The last bit is why my patch didn't land. Smoke testing did not exist at the time, the patch is unideal and may break countless things downstream.

There is also links to @bnoordhuis's proposal to fix this in libuv: libuv/libuv#428, however it was decided that it is probably better that node handle this.

I don't seem to recall us ever finding where exactly it appeared though.

If memory serves right, and by looking at the results posted by @Qix- I think this is where things started to change: libuv/libuv@b197515 Because we started to open writable TTYs in non-blocking mode. Follow the commit trail for reasonin, reverting is not an option.

Parts of this issue should go back to pre-1.0.0 .. perhaps it was amplified recently but this sounds like a conflation of multiple issues now.


This is causing some fairly wonky behavior with yargs, two questions:

  1. should we be classifying this as a bug (the new flushing behavior seems unintuitive), if so is there a separate tracking ticket I should be following?
  2. is there a recommended workaround, or should I hold my horses for a patch.

@bcoe 1. Yes. 2. Avoid process.exit() to preserve chunked stdio writes.

The change in Node.js 6 has pretty much broken everything CLI related...

@sindresorhus This goes back to v1.0.0?

Again, sounds like multiple issues, or aplification of the existing one?

+1 for new function os.exit(Boolean) that drains stdout/stderr upon exit and leave process.exit() as is.

Strongly disagree. This is a bug that ought to be fixed.


My suggestion from "process: add process.exitSoon()" (#6477) is as follows:

Make process.exit() (or rather, void Exit()):

  • uv_stop() (I think) the event loop
    • or whatever to stop anything new from happening but while keeping the threads alive to do writes
  • attempt to flush any data
  • exit

At the same time, I don't think we should alter process.abort().

Edit: it is possible that this is out of scope for process.exit(), but even if we add something new (which should be the fallback), it should have that behavior.

I'm now pretty sure it is within scope, although I'm not sure how possible my idea is.

Note: I have not yet had time to look at @kzc's patch.

uv_stop() (I think) the event loop

or whatever to stop anything new from happening but while keeping the threads alive to do writes

Writes happen in the loop thread, there are no other threads doing the writes. So if the loop is stopped no data will be written.

Note: I have not yet had time to look at @kzc's patch.

The patch is basically what @bnoordhuis proposed but at the handle level instead of a single global function.

Writes happen in the loop thread, there are no other threads doing the writes. So if the loop is stopped no data will be written.

Hmmm. What I mean by that may be more useful then: Shut down as much as possible so no other JS code runs.

I see. So some uv_walk + uv_close all handles except the ones in use for stdio + one last uv_run then.

I was thinking about this too. Seems legit. Is it not possible to do this in streamwrap, too? Attach process.exit to the last write of the stream that is currently happening?

@saghul ... I would describe it slightly differently: What we need is essentially a uv_graceful_stop() that:

  1. Puts the loop into a 'stopping' mode that disallows any new requests on handles,
  2. Closes all handles that do not have existing pending requests,
  3. Allows existing pending requests on handles to complete,
  4. Closes the remaining handles when all requests are complete,
  5. Optionally creates a timer to force handles to close / requests to cancel if requests take too long to complete.
  • Allows existing pending requests on handles to complete,
  • Closes the remaining handles when all requests are complete,

I'm not sure these are within scope, if you're calling process.exit() you are telling the process to ignore other connections and shut down?

  • Optionally creates a timer to force handles to close / requests to cancel if requests take too long to complete.

That's definitely going somewhere beyond this imo

I respectfully disagree. Allowing the existing requests to complete is what this discussion is about, yes? For me the requirement here is to have a graceful exit option. There are times when what you want is to shutdown immediately without completing the pending tasks and there are times when what you want is to shutdown cleanly with pending tasks completed or given a chance to clean up. Personally I do not want (nor do I believe it is necessary) to change the existing behavior of process.exit(). What I want is the ability to simply say, "Hey, we're shutting things down now, please finish what you're doing".

Allowing the existing requests to complete is what this discussion is about, yes?

This is about having stdio finish chunked writes, mostly.

There are times when what you want is to shutdown immediately without completing the pending tasks and there are times when what you want is to shutdown cleanly with pending tasks completed or given a chance to clean up.

I don't disagree, but that isn't process.exit()'s worry.

Personally I do not want (nor do I believe it is necessary) to change the existing behavior of process.exit(). What I want is the ability to simply say, "Hey, we're shutting things down now, please finish what you're doing".

Sure, but this is actually a bug. This used to work, why it doesn't currently is complex and awkward, but we should still fix it. (Also it violates users expectations far beyond just exiting before other connections.)

What we need is essentially a uv_graceful_stop() that:

I'm not sure how usefult this is for the general public, seems very Node specific, but let's see:

Puts the loop into a 'stopping' mode that disallows any new requests on handles,

Doable with a flag on the loop.

Closes all handles that do not have existing pending requests,

Not all handles have requests associated with them, and there are also standalone requests. If we close all handles which don't have requests we'd also close the idle and check handles used for process.nextTick, it would turn things into a royal mess.

Allows existing pending requests on handles to complete,

Sure.

Closes the remaining handles when all requests are complete,

We'd need something new here, since currently, if you close a handle with pending requests, they are cancelled (if possible).

Optionally creates a timer to force handles to close / requests to cancel if requests take too long to complete.

Not all requests are cancellable.

Overall I think this an overkill approach for the problem at hand and I don't see it happening any time soon (unless someone wants to volunteer the time to come up with a thorough design proposal and implementation). This is about flushing stdio streams on exit, which a relatively specific task IMHO.

commented

This is about having stdio finish chunked writes, mostly.

First the pending uv writes that node no longer has control over, then the chunked node writes.

This used to work

It never truly worked on UNIX, just the write thresholds were somewhat higher. As of node 6.x Mac buffers 1024 bytes of writes at the OS level and Linux buffers 64KB of writes. Non blocking writes will always succeed up until this threshold.

FWIW, the patch https://github.com/kzc/node/commit/92fc9e0d992f043a4b92d9d286514328f5df1b6d completely flushes stdio uv writes and node stream chunked writes at process.exit(). I think this behavior is expected by node users.

What we need is essentially a uv_graceful_stop()

I'd be in favor of that as long as no callbacks are invoked by libuv during such a blocking call so that no more work can be scheduled in user land. And only pending writes to stdio and files are flushed. The socket stream data can be in flight at the OS level anyway and no attempt should be made to flush them in my opinion.

It never truly worked on UNIX

Sometime prior to v1.0.0, it actually did, as far as I can tell. Again, conflating the original issue, with the amplification of it. :)

commented

Prior to v1.0.0, it actually did

Before my time and a lot of other node developers.

Instead of looking at history I think we should just address the problems as they stand now.

Instead of looking at history I think we should just address the problems as they stand now.

It's the same problem.

Instead of looking at history I think we should just address the problems as they stand now.

Looking at history is important not to make the same mistake twice. There is tons of knowledge which is unfortunately not very accessible, in the form of commit logs, but one has to know what they're looking for.

Your approach could work, but the libuv bits will need polishing and Windows support.

commented

Looking at history is important not to make the same mistake twice. There is tons of knowledge which is unfortunately not very accessible, in the form of commit logs, but one has to know what they're looking for.

Fair enough.

Your approach could work, but the libuv bits will need polishing and Windows support.

I'm the first to admit my libuv change was a hack. It's a working proof of concept. I was going to code up something to walk the event queue and flush pending write events when I discovered that uv__write already did exactly what I needed.

For the record, does Windows as of node 6.x exhibit this stdio not flushing upon process.exit() problem? The expected fail test case for this issue was not run on Windows suggesting to me that it might have actually worked. My patch moved that expected fail test to be a normal (passing) test.

Edit: the renamed test is here:

https://github.com/kzc/node/blob/92fc9e0d992f043a4b92d9d286514328f5df1b6d/test/parallel/test-stdout-buffer-flush-on-exit.js

I think this is more of a special case @jasnell. If I call process.exit() it's my expectation that the process will exit within a tick; if there's an outbound HTTP request, as an example, I wouldn't want to wait for this to complete prior to killing the process.

However, flushing stdio is an exception:

  • folks writing CLI code have been writing it with the expectation that stdio flushes within one tick, if the process chooses to exit early using process.exit() (CC: @sindresorhus)
  • contrary to an outbound HTTP request, we know that flushing stdout, stderr, etc., should flush in a finite amount of time.

Carving out an exception for stdio keeps the platform's behavior with what the community is accustomed to.

The addition of process.exitCode made this much better, the correct way to exit a node app now that has pending I/O (it called console.log) is to set exit code, and to close all handles, timers, etc. I.e., to do a graceful exit. This can be a bit annoying, in that all your resources need to be tracked and closeable, but they should be, anyhow, if you want to exit gracefully!

This works well for us... except for node v0.10, which has no process.exitCode.... on v0.10 you need to emulate it by setting a global code, then doing the graceful resource cleanup... then in the on exit handler call process.exit(code)... which is pretty ugly. I wish we could backport process.exitCode to v0.10.

Part of the problem here is that process.exit() just seems to be a reasonable way to exit node... when it isn't really, its a "terminate with prejudice" directive. This is compounded by the fact that on Unix only (not Windows), console.log used to have a special exemption to make it blocking if and only if it was a terminal... not if it was piped. So people learnt to rely on this quirk.

Oh, and CLI parsers that are calling console.log and then calling process.exit... thats just doing it wrong. They need to call log and throw an exception, and let the caller catch it and clean up gracefully before exit.

@sam-github this is a pattern that has been used by almost every CLI application in the ecosystem since the beginning of time, to terminate execution before it applies to the libraries' consumer -- the alternative would be literally 10000+ consuming libraries changing their contract with commander, yargs, optimist, meow, tap, the list goes on.

Saying "you should just do a graceful exit always" is not a valuable answer in this case, in the sense that it does not meaningfully move us towards a solution state. (Unless, of course, it's followed up with several dozen (hundred?) pull requests to refactor the many impacted programs in the suggested manner.)

The fact is that the contract changed, pretty dramatically. Maybe not the intended or documented contract, but the actual contract (as in, the way that node actually works) changed very dramatically here.

It's fine to point the finger at the userbase and say "Welp, they were doing it wrong, too bad", but the platform is mature and that kind of casual breakage is foolish. Many extant programs won't work on Node 6 because of this. Who's right or wrong hardly matters if users can't use your platform, and complaining about an un-boiled ocean doesn't solve the problem.

What's the sense in arguing when you're all alone?

I think both positions are arguable and can be combined in a fix in node. But I also think use of process.exit() was always and I cannot stress enough always wrong and should be slowly discouraged. By the way so was process.reallyExit so it's just historic...very historic. Looking at the repos in question fixes for gracefully exiting seem quite easy though.

@eljefedelrodeodeljefe I'm comfortable with this, I'm fine with pushing people towards the new behavior with yargs@5.x, however there are 13,000 consuming libraries on prior versions and I don't want to break them.

@eljefedelrodeodeljefe Up until relatively recently, there was literally no other way to set the exit code of a node process other than calling process.exit(), and it's been safe to assume that stdio streams would synchronously flush on exit for a very long time.

It's fine and good to complain about what's right or wrong. Ok, it was wrong to call process.exit, always and forever, we should feel shame for this. So we feel shame. Who cares? Node 6 broke a very significant number of node programs with this change in behavior.

So do we care about that breakage or not? It seems to me that turning our back on tens of thousands of broken programs is a choice that should not be taken lightly!

@bcoe @isaacs sure, I care deeply fixing it - especially because the use is so widespread and clis being so crucial for the ecosystem. I didn't see a decent fix just yet :( My point is just that it should be a combined effort. If you see ways of mitigating this in future versions of the repos please go ahead immediately and node should fix everything backwards.

Funny enough I was suggesting using the EE instead of exit handlers as main control flow of a CLI some months ago but didn't see at the time that it's gonna explode that much.

Let's come up with good ideas over thee weekend! :)

commented

@eljefedelrodeodeljefe Not sure what you're looking for in a fix that isn't some variation of the simple patch I proposed. stdio libuv queued writes and queued node chunked writes have to be flushed in a sync manner prior to exit(). That's all.

and it's been safe to assume that stdio streams would synchronously flush on exit for a very long time.

Node 6 broke a very significant number of node programs with this change in behavior.

Can someone point me to exactly how the behavior changed in v6? As far as I am aware, it is only easier to trigger, and that it is indeed the issue I mentioned previously. (I.e. writes become chunked at a smaller size) Is that true? (@saghul?)

@kzc submit it as a PR here.

commented

@Qix- I just put together the proof of concept to show this issue could be solved in a straightforward manner. I'm hoping some else would run with it and do the Windows piece - assuming it is required. Because the patch touches libuv it might be better for those folks to do that part. Presently uv_flush_sync, a.k.a. uv__write, does not return any error code. Inspecting such an error code would be wise before flushing the node-side stream chunks otherwise there could be a gap in the data. Or perhaps the libuv team would prefer to implement some variation of @jasnell's uv_graceful_stop() proposal. Either would be fine with me.

commented

@saghul Is there a libuv function that provides the number of bytes pending to be written for a uv_stream_t*? That info would be sufficient to know whether the flushing of the node-side chunks can proceed - i.e., when libuv bytes pending to written for the stream is 0. If this function does not already exist it would be useful if uv__write() were to return that value.

Can someone point me to exactly how the behavior changed in v6? As far as I am aware, it is only easier to trigger,

That's my understanding. While we were doing async writes before 1.0, it's possible that the pty fix which landed in 1.9.0 also fixed it on OSX, thus making them really async.

I think we haven't understood 100% where the problem is. We do see its effect. So, it would be interesting to take Node 5 and see if stdout writes block or not on OSX. Then the same with Node 6 (which do not block).

Is there a libuv function that provides the number of bytes pending to be written for a uv_stream_t*?

@kzc see: http://docs.libuv.org/en/v1.x/stream.html#c.uv_stream_t.write_queue_size

it's possible that the pty fix which landed in 1.9.0 also fixed it on OSX, thus making them really async.

I don't think so, if you run https://github.com/nodejs/node/blob/master/test/known_issues/test-stdout-buffer-flush-on-exit.js on an OS X machine prior to that it also happens as expected.

I think we haven't understood 100% where the problem is. We do see its effect.

My understanding at the time when I investigated it with bnoordhuis was that flushing wasn't happening at the OS level, I think?

So, it would be interesting to take Node 5 and see if stdout writes block or not on OSX. Then the same with Node 6 (which do not block).

Hmmmm, I'm quite certain it did not block but I don't have proof of that, or at least not off-hand.

commented

Revised process.exit fix with improved error checking: https://github.com/kzc/node/commit/29997921800e00a22d9f92d24704a0021be03bbf

commented

@Fishrock123 The patch https://github.com/kzc/node/commit/29997921800e00a22d9f92d24704a0021be03bbf renames the test you mentioned to test/parallel/test-stdout-buffer-flush-on-exit.js and runs it successfully on Mac and Linux.

for the public record, here's the workaround that I'm about to land for yargs (commander should be able to use a similar approach):

yargs/yargs#501

I did some more digging for The yargs issue. Here's some old discussion of "Fix blocking / non-blocking stdio woes": nodejs/node-v0.x-archive#3584

Currently (At the time of linked issue) process.stdin / stdout / stderr is blocking, except when it is a pipe on windows. Weird and surprising. Very unpractical in cases where stdio is used as an IPC mechanism between node processes.

Also, net.Socket#_handle.setBlocking() appears to have been added in 20176a9 (v0.11.2)

created a shim here for anyone else running into this issue.

commented

@bcoe Be aware that calling setBlocking(true) is not a cure all. If a large write takes place before setBlocking(true) it does not work.

See: #6456 (comment)

That's why the process.exit() issue should be addressed in node itself and back ported.

Is test-stdout-buffer-flush-on-exit.js supposed to be reliable? It fails for me on Node 4, 5 and 6 on OSX and on Node 5 and 6 on Linux (haven't tested 4 there).

Is there a reliable test we can run with git bisect to try to understand where the problem originates? This one also fails for me with Node 4 on Linux. I'm really confused now. :-S

commented

@saghul test-stdout-buffer-flush-on-exit.js is an expected-fail on node 4, 5 and 6 (without my patch).

Here's another program that fails to run as expected on node 4, 5 and 6 in a unix terminal:

// this program populates the libuv write queue upon first write over 64K
// then will populate the node stream chunk queue for subsequent writes.
for (var i = 1; i <= 1000; ++i) {
  process.stdout.write((i + 
    ': The quick brown fox jumps over the lazy dog.\n').repeat(1500));
}
process.exit(1);

It will run successfully on Mac and Linux with https://github.com/kzc/node/commit/29997921800e00a22d9f92d24704a0021be03bbf

@kzc I know your patch will make it work. What I want to know is what and where changed subtly between Node 4 and Node 6, since the core principle (writes being async) remained.

IOW, we need a test which passes in 4 (and maybe 5) but doesn't in 6.

commented

The behavior only noticeably changed on Mac with node 6.0.0/libuv 1.9.0 at the tty: #6456 (comment)

Do not pipe or redirect the output as it changes the test and a different code path is taken in node. test-stdout-buffer-flush-on-exit.js is a pipe test, not a tty test.

It appears as if the fd is blocking at the tty in node 4 and 5 on Mac, but non-blocking on node 6 at the tty.

Linux behavior is the same in node 4, 5 and 6 - 64K is output upon process.exit() at tty before truncation.

commented

But even when stdio is piped, node 4, 5 and 6 never completely flushed stdio upon process.exit(). I believe these stdio streams ought to be completely flushed regardless of being run at the tty or piped or redirected to a file.

@saghul known_issues tests fail if the bug exists. Contrary to what @kzc says, this bug exists since Node.js (io.js) v1.0.0. It does not appear to exist in v0.12 (.x) or before. The recent libuv 1.9.0 patch appears to have made it so that chunked writes are triggered at much smaller buffer(?) sizes, meaning the bug appears far more easily.

It appears as if the fd is blocking at the tty in node 4 and 5 on Mac, but non-blocking on 6 at the tty.

@kzc This isn't correct as far as my investigations have gone (see above..), but when chunked only the first chunk is flushed on (fast) exit.

commented

@Fishrock123 I'm aware of what known_issues is for. I did not mention node v1.0.0, nor v0.12.

tty behavior is different than piped behavior.

My patch fixes flushing of stdio at the tty or when piped upon process.exit().

tty behavior is different than piped behavior.

Ah, right. Perhaps that has changed. I have not taken a look.

commented

See also: #6456 (comment)

Hmmmmmm, looks like it's not in v1.0.0 after all but rather v1.0.2 (close enough).

(I forgot that I had to run iojs on older versions instead of node.)

Bisect of v1.0.0 and v2.0.0:

07bd05ba332e078c1ba76635921f5448a3e884cf is the first bad commit
commit 07bd05ba332e078c1ba76635921f5448a3e884cf
Author: Saúl Ibarra Corretgé <saghul@gmail.com>
Date:   Wed Jan 14 20:26:02 2015 +0100

    deps: update libuv to 1.2.1

    PR: https://github.com/iojs/io.js/pull/423
    Reviewed-by: Ben Noordhuis <info@bnoordhuis.nl>
    Reviewed-by: Bert Belder <bertbelder@gmail.com>

:040000 040000 f2ba9fd93434ec2457a5e53d5c047efedad70462 789f762d0ea0e69993460303430f5851bf1de1a0 M  deps
bisect run success

@saghul That's 07bd05b

From the libuv 1.2.1 changelog: * unix: set non-block mode in uv_{pipe,tcp,udp}_open (Ben Noordhuis)

Maybe that's the culprit? (of the original issue?)

@kzc Can you open a PR with your code? We will be better able to move forward in that format.

From the libuv 1.2.1 changelog: * unix: set non-block mode in uv_{pipe,tcp,udp}_open (Ben Noordhuis)

Maybe that's the culprit? (of the original issue?)

Yes, that the original change. But we already knew that. The question is what made the thing worse between 4 and 6. The only change I can think of is: libuv/libuv@387102b but I can't see how opening /dev/tty or /dev/ttys00X changes anything.

Yes, that the original change. But we already knew that.

(We did?)

commented

@Fishrock123 I'd rather someone else put the PR together as there will likely be changes required for windows libuv support and additional error handling if flushSync is to be used in a non-process.exit() context that I don't have the time for. I just wanted to get the ball rolling with a working proof of concept and don't care about attribution for the patch.

I've adapted the proposed fix into #6773 for further development.

@Fishrock123

Can someone point me to exactly how the behavior changed in v6?

The change in the chunk size is exactly the behavior that changed, and the change in behavior is significant enough to be very relevant.

Here's a test script: https://gist.github.com/isaacs/1495b91ec66b21d30b10572d72ad2cdd

The number of test characters can be set by the first argument, and you can pass noexit as the second argument to make it not call process.exit().

When run with stdout and stderr both outputting to the terminal, the expected behavior is to print out a series of o to the terminal, then a capital O, carriage return, followed by a series of x, then capital X, carriage return, and then process exit. This works up to the point where it maxes out my terminal's buffer.

In versions of Node prior to 6.0, that expectation was always met, even when running node issue-6456.js 100000. In 6.0 and 6.1, the stdout and stderr are munged together, and if process.exit() is called, then the streams don't end, for any value above 1kb.

If the strings are not just x's and o's (for example, control characters returning the cursor to the start of the line in order to do a progress bar, like a certain CLI used by most node devs), then it gets even more wild, because the overlap can occur at unexpected places. Pass fancy as the third argument to the script in order to test this.

$ node issue-6456.js 100000 x fancy
ooooooooooooooooooooooooO
xxxxxxxxxxxxxxxxxxxxxxxxX

$ nave use 6.0

$ node issue-6456.js 100000 x fancy
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx            # <-- note, no CR here
$ node issue-6456.js 100000 noexit fancy
ooooooooooooooooooooooooOxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxX

commented

Regardless of the change introduced in node 6.0.0, stdout and stderr was never fully flushed at process.exit() on all platforms in node 4 and 5. Node on Linux only flushes it to 64K at the tty, for example. I think we should fix this problem once and for all and fully flush stdio streams upon process.exit() to match the behaviour of a "regular" non-process.exit() program.

@isaacs Thanks for the info, could you clarify what you mean by this?

In 6.0 and 6.1, the stdout and stderr are munged together


then it gets even more wild, because the overlap can occur at unexpected places.

Oh my, interesting.


Hmmm, I had forgotten the original issue didn't effect TTYs and only pipes. Strange that there doesn't seem to be a directly related libuv change.