edenhill / kcat

Generic command line non-JVM Apache Kafka producer and consumer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

stdout buffer causes messages not to be delivered on arrival

leroix opened this issue Β· comments

By default, stdout is buffered. If writing out to a terminal, it will flush after each newline. However, if the stdout is redirected, it won't flush on each newline. It can be flushed manually or the buffer can be disabled.

http://stackoverflow.com/questions/1716296/why-does-printf-not-flush-after-the-call-unless-a-newline-is-in-the-format-strin

Something like this could fix the problem:

int main (int argc, char **argv) {
        setbuf(stdout, NULL);

Or the buffer can be flushed manually after each write.

I made this an issue rather than a PR because I'm not sure of the best way to handle it.

Would a command-line option to turn on unbuffered I/O be sufficient?
I'm not 100% certain unbuffered I/O wont have adverse performance impacts in some cases (e,g. small messages), and would therefor not want to make this default at this point.

Yea agreed. Making it configurable would enable it to work in either case
πŸ‘
On Jun 8, 2014 1:29 PM, "Magnus Edenhill" notifications@github.com wrote:

Would a command-line option to turn on unbuffered I/O be sufficient?
I'm not 100% certain unbuffered I/O wont have adverse performance impacts
in some cases (e,g. small messages), and would therefor not want to make
this default at this point.

β€”
Reply to this email directly or view it on GitHub
#3 (comment).

Shouldn't there still be a flush after a given timeout though? I can't
think of a case where it'd be OK for data to sit in the stdout buffer
indefinitely unless more data was buffered
On Jun 8, 2014 2:05 PM, "Justin Ratner" leroix08@gmail.com wrote:

Yea agreed. Making it configurable would enable it to work in either case
πŸ‘
On Jun 8, 2014 1:29 PM, "Magnus Edenhill" notifications@github.com
wrote:

Would a command-line option to turn on unbuffered I/O be sufficient?
I'm not 100% certain unbuffered I/O wont have adverse performance impacts
in some cases (e,g. small messages), and would therefor not want to make
this default at this point.

β€”
Reply to this email directly or view it on GitHub
#3 (comment).

You are probably right, I'll look into it.

OK thinking about it a little more. A command-line arg to set buffered
true/false would be the simplest solution. Worrying about flushing after a
timeout would probably be getting ahead of ourselves.
On Jun 8, 2014 2:08 PM, "Justin Ratner" leroix08@gmail.com wrote:

Shouldn't there still be a flush after a given timeout though? I can't
think of a case where it'd be OK for data to sit in the stdout buffer
indefinitely unless more data was buffered
On Jun 8, 2014 2:05 PM, "Justin Ratner" leroix08@gmail.com wrote:

Yea agreed. Making it configurable would enable it to work in either case
πŸ‘
On Jun 8, 2014 1:29 PM, "Magnus Edenhill" notifications@github.com
wrote:

Would a command-line option to turn on unbuffered I/O be sufficient?
I'm not 100% certain unbuffered I/O wont have adverse performance
impacts in some cases (e,g. small messages), and would therefor not want to
make this default at this point.

β€”
Reply to this email directly or view it on GitHub
#3 (comment).

Yeah, the downside of -u is the syscall overhead when consuming lots of small messages.
Each message will perform a write() when unbuffered, while in buffered mode a bunch of messages will be accumulated before that write().