Tyrrrz / CliWrap

Library for running command-line processes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stdout and stderr order not preserved

aetonsi opened this issue · comments

Version

3.6.0

Details

I was trying to output the entire invocation's output (so both stdout and stderr) to a file, but the order of output and error data is always messed up.
AFAIK, maintaining the output's correct order is a common problem with command line programs, due to output buffering or something...

Steps to reproduce

As test, i used a simple php invocation (all the following scripts are run in powershell):

PS> php -r "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; "
# and the output is:
o111  E222  E333  o444  o555

But if i try to invoke that in a C# app via Cliwrap, everything i try messes up the output in some way.
I tried outputting to the console output stream, to a StringBuilder, to a file. Nothing works correctly.
Here's a simple c# test program:

using CliWrap;
using System.Text;

string program = "php";
string[] program_args = new[] { "-r", "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; " };

string fp = @"c:\tmp\buff.txt";
FileStream fs = System.IO.File.Open(fp, FileMode.Create);
StringBuilder sb = new StringBuilder();
PipeTarget t = PipeTarget.Merge(new[] {
    PipeTarget.ToStream(Console.OpenStandardOutput()),
    PipeTarget.ToStringBuilder(sb),
    PipeTarget.ToStream(fs)
});

Command cmd = Cli.Wrap(program)
    .WithArguments(program_args, true)
    .WithValidation(CommandResultValidation.None) | (t, t);

Console.Write(","); // to create CSV structure (it concatenates to the native php call output)
await (cmd.ExecuteAsync()); // output: console standard output stream
Console.Write("," + sb.ToString()); // output: string builder
fs.Close();
Console.Write("," + System.IO.File.OpenText(fp).ReadToEnd()); // output: file contents

And here's a powershell script to compare the correct native output, to Cliwrap's tries:

# it loops 10 times
1..10 | ForEach-Object {
    &  php -r "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; "
    & .\myprogram
    ''
}

The output is practically (willingly) in CSV format, so here's the result converted to a markdown table:

direct php call console string builder file
o111 E222 E333 o444 E222 o111 E333 o444 o111 o444 E222 E333 o111 E222 E333 o444
o111 E222 E333 o444 E222 o111 E333 o444 o111 E222 E333 o444 o111 E222 o444 E333
o111 E222 E333 o444 E222 o111 E333 o444 o111 o444 E222 E333 o111 E222 o444 E333
o111 E222 E333 o444 o111 E222 o444 E333 E222 E333 o111 o444 E222 o111 E333 o444
o111 E222 E333 o444 E222 o111 E333 o444 E222 E333 o111 o444 o111 E222 o444 E333
o111 E222 E333 o444 o111 E222 o444 E333 o111 E222 o444 E333 o111 E222 o444 E333
o111 E222 E333 o444 o111 E222 o444 E333 E222 o111 E33 o444 E222 o111 E333 o444
o111 E222 E333 o444 o111 E222 o444 E333 o111 o444 E222 E333 o111 E222 o444 E333
o111 E222 E333 o444 E222 E333 o111 o444 o111 E222 E333 o444 E222 o111 E333 o444
o111 E222 E333 o444 E222 o111 E333 o444 o111 E222 E33 o444 E222 o111 E333 o444

As you can see, when directly invoking php, the output is always in correct order. When piped via Cliwrap, it's always randomly out of order, and sometimes it's even partially truncated (E33 instead of E333 on a couple of lines).

Is there any way to fix this ordering problem, or any way to pipe/output everything together and have it in the correct order?


thank you
have a nice day

If you run your php script in a command shell and pipe the stdout/stderr to separate files, does it maintain order?

yeah you can try it yourself if you want, i simply used php as a simple means of having irregularly alternating output and error data.
You can create a simple php script with totally random alternating stdout/stderr like this (pwsh):

'<?php ' > .\script.php
1..10 | ForEach-Object {
    $stream = @('STDOUT', 'STDERR') | Get-Random
    $text = "$("$stream".Substring(3))$_"
    "fwrite($stream, '$text ');" >>script.php
}

That generates a script.php like this:

<?php 
fwrite(STDOUT, 'OUT1 ');
fwrite(STDERR, 'ERR2 ');
fwrite(STDERR, 'ERR3 ');
fwrite(STDERR, 'ERR4 ');
fwrite(STDOUT, 'OUT5 ');
fwrite(STDERR, 'ERR6 ');
fwrite(STDOUT, 'OUT7 ');
fwrite(STDOUT, 'OUT8 ');
fwrite(STDERR, 'ERR9 ');
fwrite(STDOUT, 'OUT10 ');

And if you invoke the same script 10 times the output is always the same (note the >>stdout.log 2>>stderr.log splitting the output):

1..10 | ForEach-Object {
    & php -f script.php >>stdout.log 2>>stderr.log
}

it generates the following files:

stdout.log
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
stderr.log
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 

Problems arise only when using Cliwrap in c#.
I am using the | ( t, t ) to pipe to the same FileStream object, shouldn't this work? Or is it wrong?

I am using the | ( t, t ) to pipe to the same FileStream object, shouldn't this work? Or is it wrong?

That is correct. However, the stream implementation itself may not be thread-safe. You can try using Stream.Synchronized(...) although I think that only affects sync methods and probably won't help much in this case anyway.

yes as you guessed using a synchronized stream doesn't make a difference:

Stream ss = Stream.Synchronized(fs);
PipeTarget t = PipeTarget.Merge(new[] {
    PipeTarget.ToStream(ss)
});

same result.

The stringbuilder is even worse, it loses pieces of output down the road:

test code
using CliWrap;
using System.Text;

string program = "php";
string[] program_args = new[] { "-f", "script.php" };
string fp = @".\cliwrap.log";

StringBuilder sb = new StringBuilder();
PipeTarget t = PipeTarget.Merge(new[] {
    PipeTarget.ToStringBuilder(sb)
});

await (
    Cli.Wrap(program)
    .WithArguments(program_args, true)
    .WithValidation(CommandResultValidation.None) | (t, t)
).ExecuteAsync();


File.AppendAllText(fp, sb.ToString());
File.AppendAllText(fp, Environment.NewLine);
stringbuilder result
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 
ERR2 OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 
ERR2 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 
OUT1 ERR2 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 
OUT1 ERR2 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 
OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9 
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10 

The interesting thing is, PipeTarget.ToFile(...) has the correct order and data is not broken up. It is simply missing the entire STDERR data:

code
using CliWrap;

string program = "php";
string[] program_args = new[] { "-f", "script.php" };
string fpr = @".\cliwrap.log" + (new Random().Next(1, 10000));

PipeTarget t = PipeTarget.Merge(new[] {
    PipeTarget.ToFile(fpr),
});

await (
    Cli.Wrap(program)
    .WithArguments(program_args, true)
    .WithValidation(CommandResultValidation.None) | (t, t)
).ExecuteAsync();
log (reading all "cliwrap.log*" that the script generates; 1 file content per line:)

to read all files: get-content -Path cliwrap.log\*

OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 
OUT1 OUT5 OUT7 OUT8 OUT10 

Okay, so your last comment suggests more to me that it might be a thread-safety issue. Try this:

using var semaphore = new SemaphoreSlim(1, 1);
var stringBuilder = new StringBuilder();

var target = PipeTarget.ToDelegate(async line =>
{
	await semaphore.WaitAsync(cancellationToken);
	try
	{
		stringBuilder.AppendLine(line);
	}
	finally
	{
		semaphore.Release();
	}
});

var cmd = Cli.Wrap(...) | (target, target);

Haven't tested the above code myself so please review it.

I came back to investigate a bit more and reproduced the original scenario with the following cmd script:

@echo off
setlocal EnableDelayedExpansion

for /L %%i in (1,1,100) do (
  set /A "remainder=%%i %% 2"
  if !remainder! == 0 (
    echo ERR %%i 1>&2
  ) else (
    echo OUT %%i
  )
)

Running the following maintains order:

$ test.bat > out.txt 2>&1

The following CliWrap code does not maintain order:

var sb = new StringBuilder();
using var fs = File.Create("out.txt");

var target = PipeTarget.Merge(
    PipeTarget.ToStream(Console.OpenStandardOutput()),
    PipeTarget.ToStringBuilder(sb),
    PipeTarget.ToStream(fs)
);

await (Cli.Wrap("test.bat") | (target, target))
    .ExecuteAsync();
Output
OUT 1
ERR 2
OUT 3
ERR 4
OUT 5
ERR 6
OUT 7
ERR 8
OUT 9
ERR 10
ERR 12
OUT 11
ERR 14
OUT 13
OUT 15
ERR 16
OUT 17
ERR 18
ERR 20
OUT 19
ERR 22
OUT 21
ERR 24
OUT 23
OUT 25
OUT 27
ERR 26
OUT 29
ERR 28
OUT 31
OUT 33
OUT 35
ERR 30
ERR 32
ERR 34
OUT 37
ERR 36
ERR 38
OUT 39
OUT 41
ERR 40
ERR 42
OUT 43
ERR 44
OUT 45
ERR 46
ERR 48
OUT 47
OUT 49
ERR 50
OUT 51
OUT 53
OUT 55
OUT 57
OUT 59
ERR 52
ERR 54
OUT 61
ERR 56
OUT 63
OUT 65
ERR 58
OUT 67
ERR 60
OUT 69
OUT 71
ERR 62
ERR 64
OUT 73
OUT 75
ERR 66
OUT 77
OUT 79
ERR 68
ERR 70
OUT 81
ERR 72
OUT 83
OUT 85
ERR 74
OUT 87
ERR 76
OUT 89
ERR 78
ERR 80
OUT 91
ERR 82
OUT 93
ERR 84
OUT 95
ERR 86
OUT 97
ERR 88
ERR 90
ERR 92
OUT 99
ERR 94
ERR 96
ERR 98
ERR 100

The following CliWrap code does maintain order:

var sb = new StringBuilder();
using var fs = File.Create("out.txt");

using var semaphore = new SemaphoreSlim(1, 1);
var target = PipeTarget.ToDelegate(async line =>
{
    await semaphore.WaitAsync();
    try
    {
        Console.WriteLine(line);
        sb.AppendLine(line);
        await fs.WriteAsync(Encoding.UTF8.GetBytes(line));
    }
    finally
    {
        semaphore.Release();
    }
});

await (Cli.Wrap("test.bat") | (target, target))
    .ExecuteAsync();
Output
OUT 1
ERR 2 
OUT 3
ERR 4 
OUT 5
ERR 6 
OUT 7
ERR 8 
OUT 9
ERR 10 
OUT 11
ERR 12 
OUT 13
ERR 14 
OUT 15
ERR 16 
OUT 17
ERR 18 
OUT 19
ERR 20 
OUT 21
ERR 22 
OUT 23
ERR 24 
OUT 25
ERR 26 
OUT 27
ERR 28 
OUT 29
ERR 30 
OUT 31
ERR 32 
OUT 33
ERR 34 
OUT 35
ERR 36 
OUT 37
ERR 38 
OUT 39
ERR 40 
OUT 41
ERR 42 
OUT 43
ERR 44 
OUT 45
ERR 46 
OUT 47
ERR 48 
OUT 49
ERR 50 
OUT 51
ERR 52 
OUT 53
ERR 54 
OUT 55
ERR 56 
OUT 57
ERR 58 
OUT 59
ERR 60 
OUT 61
ERR 62 
OUT 63
ERR 64 
OUT 65
ERR 66 
OUT 67
ERR 68 
OUT 69
ERR 70 
OUT 71
ERR 72 
OUT 73
ERR 74 
OUT 75
ERR 76 
OUT 77
ERR 78 
OUT 79
ERR 80 
OUT 81
ERR 82 
OUT 83
ERR 84 
OUT 85
ERR 86 
OUT 87
ERR 88 
OUT 89
ERR 90 
OUT 91
ERR 92 
OUT 93
ERR 94 
OUT 95
ERR 96 
OUT 97
ERR 98 
OUT 99
ERR 100 

I'm not sure how cmd or PowerShell do it on their end, but I assume they have a way to differentiate between discrete write operations, which allows them to synchronize them even if the outputs are not separated by line breaks (or any other predefined character sequence).

On CliWrap's side, stdout and stderr are just homogenous binary streams, so it's impossible (to my knowldge) to reliably identify individual writes. The ToDelegate(...) approach above identifies them by line breaks.

For completeness’s sake, I tested it out with Process too and it also doesn't ensure the order of writes:

using var process = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = "test.bat",
        RedirectStandardOutput = true,
        RedirectStandardError = true
    },
    EnableRaisingEvents = true
};

process.OutputDataReceived += (sender, args) =>
{
    if (args.Data is not null)
    {
        Console.WriteLine(args.Data);
    }
};

process.ErrorDataReceived += (sender, args) =>
{
    if (args.Data is not null)
    {
        Console.WriteLine(args.Data);
    }
};

process.Start();
process.BeginOutputReadLine();
process.BeginErrorReadLine();

process.WaitForExit();

Unfortunately, it seems that with a larger sequence of writes, even my second method doesn't work. I think it may be impossible to achieve this directly.

However, one trick you can do is wrap the shell and merge stdout and stderr:

Cli.Wrap("cmd").WithArguments(new[] {"/c", "test.bat 2>&1"})

Then you can process the output normally in CliWrap, since it's now just one stream:

var sb = new StringBuilder();
using var fs = File.Create("out.txt");

var target = PipeTarget.Merge(
    PipeTarget.ToStream(Console.OpenStandardOutput()),
    PipeTarget.ToStringBuilder(sb),
    PipeTarget.ToStream(fs)
);

await (Cli.Wrap("cmd").WithArguments(new[] {"/c", "test.bat 2>&1"}) | target)
    .ExecuteAsync();

Sorry i didn't reply for weeks, i literally had the tab open but too many things to do..
You mean to say it's totally impossible (as far as you understand) to maintain output order with c#? Or with any .NET language maybe? At which level do things break in your opinion?
I've always had problems myself with this thing in c# and always assumed i was doing something wrong since i'm not a c# dev.
But if it's totally impossible it might be useful to report it to someone (to whom, i don't know, it depends)

tyfyt

You mean to say it's totally impossible (as far as you understand) to maintain output order with c#? Or with any .NET language maybe? At which level do things break in your opinion?

It's definitely impossible using .NET's Process class (which CliWrap is based on), from my understanding. If you make your own native calls to create and run a process, then it should be theoretically possible on Windows, I believe. I don't know about other operating systems, though.

This is not specific to C# however, as you can find people facing the same challenges in other languages and platforms:

Many of the answers above reference "buffering" as being the cause. That's probably true at the system level, but even if the client consistently flushed data after every write, the order of data could still get mangled by the layers of indirection that it has to go through before it reaches your code. In .NET, for example, you can theoretically do a Task.WhenAny(...) loop on stdout/stderr reads, but that would only preserve order assuming it's guaranteed in all the underlying stream plumbing (which it probably is not).

I think the main barrier is that the standard streams were not meant to be synchronized from a conceptual standpoint, so any successful attempts to achieve that may just be coincidental.

Yes i saw a few of those exact threads you linked.. i did find some solution for linux a long time ago, can't remember what exactly, that altered or even disabled the console's buffering and allowed for ordered output. I can't remember if i even tested it though, as i focus mainly on Windows atm.
Still, there has to be a way to like, "hook deeper" into the system's calls, right below where said buffering occurs.. but how? how do cmd/pwsh do it for example? but this is more a question for some stack* website i think....
It's bittersweet to know that me being a c# newbie wasn't the problem :\