Stdout and stderr order not preserved
aetonsi opened this issue · comments
Version
3.6.0
Details
I was trying to output the entire invocation's output (so both stdout and stderr) to a file, but the order of output and error data is always messed up.
AFAIK, maintaining the output's correct order is a common problem with command line programs, due to output buffering or something...
Steps to reproduce
As test, i used a simple php
invocation (all the following scripts are run in powershell):
PS> php -r "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; "
# and the output is:
o111 E222 E333 o444 o555
But if i try to invoke that in a C# app via Cliwrap, everything i try messes up the output in some way.
I tried outputting to the console output stream, to a StringBuilder
, to a file. Nothing works correctly.
Here's a simple c# test program:
using CliWrap;
using System.Text;
string program = "php";
string[] program_args = new[] { "-r", "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; " };
string fp = @"c:\tmp\buff.txt";
FileStream fs = System.IO.File.Open(fp, FileMode.Create);
StringBuilder sb = new StringBuilder();
PipeTarget t = PipeTarget.Merge(new[] {
PipeTarget.ToStream(Console.OpenStandardOutput()),
PipeTarget.ToStringBuilder(sb),
PipeTarget.ToStream(fs)
});
Command cmd = Cli.Wrap(program)
.WithArguments(program_args, true)
.WithValidation(CommandResultValidation.None) | (t, t);
Console.Write(","); // to create CSV structure (it concatenates to the native php call output)
await (cmd.ExecuteAsync()); // output: console standard output stream
Console.Write("," + sb.ToString()); // output: string builder
fs.Close();
Console.Write("," + System.IO.File.OpenText(fp).ReadToEnd()); // output: file contents
And here's a powershell script to compare the correct native output, to Cliwrap's tries:
# it loops 10 times
1..10 | ForEach-Object {
& php -r "echo ' o111 '; fwrite(STDERR, ' E222 '); fwrite(STDERR,' E333 '); echo ' o444 '; "
& .\myprogram
''
}
The output is practically (willingly) in CSV format, so here's the result converted to a markdown table:
direct php call | console | string builder | file |
---|---|---|---|
o111 E222 E333 o444 |
E222 o111 E333 o444 |
o111 o444 E222 E333 |
o111 E222 E333 o444 |
o111 E222 E333 o444 |
E222 o111 E333 o444 |
o111 E222 E333 o444 |
o111 E222 o444 E333 |
o111 E222 E333 o444 |
E222 o111 E333 o444 |
o111 o444 E222 E333 |
o111 E222 o444 E333 |
o111 E222 E333 o444 |
o111 E222 o444 E333 |
E222 E333 o111 o444 |
E222 o111 E333 o444 |
o111 E222 E333 o444 |
E222 o111 E333 o444 |
E222 E333 o111 o444 |
o111 E222 o444 E333 |
o111 E222 E333 o444 |
o111 E222 o444 E333 |
o111 E222 o444 E333 |
o111 E222 o444 E333 |
o111 E222 E333 o444 |
o111 E222 o444 E333 |
E222 o111 E33 o444 |
E222 o111 E333 o444 |
o111 E222 E333 o444 |
o111 E222 o444 E333 |
o111 o444 E222 E333 |
o111 E222 o444 E333 |
o111 E222 E333 o444 |
E222 E333 o111 o444 |
o111 E222 E333 o444 |
E222 o111 E333 o444 |
o111 E222 E333 o444 |
E222 o111 E333 o444 |
o111 E222 E33 o444 |
E222 o111 E333 o444 |
As you can see, when directly invoking php, the output is always in correct order. When piped via Cliwrap, it's always randomly out of order, and sometimes it's even partially truncated (E33
instead of E333
on a couple of lines).
Is there any way to fix this ordering problem, or any way to pipe/output everything together and have it in the correct order?
thank you
have a nice day
If you run your php script in a command shell and pipe the stdout/stderr to separate files, does it maintain order?
yeah you can try it yourself if you want, i simply used php as a simple means of having irregularly alternating output and error data.
You can create a simple php script with totally random alternating stdout/stderr like this (pwsh):
'<?php ' > .\script.php
1..10 | ForEach-Object {
$stream = @('STDOUT', 'STDERR') | Get-Random
$text = "$("$stream".Substring(3))$_"
"fwrite($stream, '$text ');" >>script.php
}
That generates a script.php
like this:
<?php
fwrite(STDOUT, 'OUT1 ');
fwrite(STDERR, 'ERR2 ');
fwrite(STDERR, 'ERR3 ');
fwrite(STDERR, 'ERR4 ');
fwrite(STDOUT, 'OUT5 ');
fwrite(STDERR, 'ERR6 ');
fwrite(STDOUT, 'OUT7 ');
fwrite(STDOUT, 'OUT8 ');
fwrite(STDERR, 'ERR9 ');
fwrite(STDOUT, 'OUT10 ');
And if you invoke the same script 10 times the output is always the same (note the >>stdout.log 2>>stderr.log
splitting the output):
1..10 | ForEach-Object {
& php -f script.php >>stdout.log 2>>stderr.log
}
it generates the following files:
stdout.log
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
stderr.log
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9
Problems arise only when using Cliwrap in c#.
I am using the | ( t, t )
to pipe to the same FileStream
object, shouldn't this work? Or is it wrong?
I am using the
| ( t, t )
to pipe to the sameFileStream
object, shouldn't this work? Or is it wrong?
That is correct. However, the stream implementation itself may not be thread-safe. You can try using Stream.Synchronized(...)
although I think that only affects sync methods and probably won't help much in this case anyway.
yes as you guessed using a synchronized stream doesn't make a difference:
Stream ss = Stream.Synchronized(fs);
PipeTarget t = PipeTarget.Merge(new[] {
PipeTarget.ToStream(ss)
});
same result.
The stringbuilder is even worse, it loses pieces of output down the road:
test code
using CliWrap;
using System.Text;
string program = "php";
string[] program_args = new[] { "-f", "script.php" };
string fp = @".\cliwrap.log";
StringBuilder sb = new StringBuilder();
PipeTarget t = PipeTarget.Merge(new[] {
PipeTarget.ToStringBuilder(sb)
});
await (
Cli.Wrap(program)
.WithArguments(program_args, true)
.WithValidation(CommandResultValidation.None) | (t, t)
).ExecuteAsync();
File.AppendAllText(fp, sb.ToString());
File.AppendAllText(fp, Environment.NewLine);
stringbuilder result
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10
ERR2 OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9
ERR2 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10
OUT1 ERR2 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9
OUT1 ERR2 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9
OUT1 OUT5 OUT7 OUT8 OUT10 ERR3 ERR4 ERR6 ERR9
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10
ERR2 OUT1 ERR3 ERR4 ERR6 ERR9 OUT5 OUT7 OUT8 OUT10
The interesting thing is, PipeTarget.ToFile(...)
has the correct order and data is not broken up. It is simply missing the entire STDERR data:
code
using CliWrap;
string program = "php";
string[] program_args = new[] { "-f", "script.php" };
string fpr = @".\cliwrap.log" + (new Random().Next(1, 10000));
PipeTarget t = PipeTarget.Merge(new[] {
PipeTarget.ToFile(fpr),
});
await (
Cli.Wrap(program)
.WithArguments(program_args, true)
.WithValidation(CommandResultValidation.None) | (t, t)
).ExecuteAsync();
log (reading all "cliwrap.log*" that the script generates; 1 file content per line:)
to read all files: get-content -Path cliwrap.log\*
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
OUT1 OUT5 OUT7 OUT8 OUT10
Okay, so your last comment suggests more to me that it might be a thread-safety issue. Try this:
using var semaphore = new SemaphoreSlim(1, 1);
var stringBuilder = new StringBuilder();
var target = PipeTarget.ToDelegate(async line =>
{
await semaphore.WaitAsync(cancellationToken);
try
{
stringBuilder.AppendLine(line);
}
finally
{
semaphore.Release();
}
});
var cmd = Cli.Wrap(...) | (target, target);
Haven't tested the above code myself so please review it.
I came back to investigate a bit more and reproduced the original scenario with the following cmd
script:
@echo off
setlocal EnableDelayedExpansion
for /L %%i in (1,1,100) do (
set /A "remainder=%%i %% 2"
if !remainder! == 0 (
echo ERR %%i 1>&2
) else (
echo OUT %%i
)
)
Running the following maintains order:
$ test.bat > out.txt 2>&1
The following CliWrap code does not maintain order:
var sb = new StringBuilder();
using var fs = File.Create("out.txt");
var target = PipeTarget.Merge(
PipeTarget.ToStream(Console.OpenStandardOutput()),
PipeTarget.ToStringBuilder(sb),
PipeTarget.ToStream(fs)
);
await (Cli.Wrap("test.bat") | (target, target))
.ExecuteAsync();
Output
OUT 1
ERR 2
OUT 3
ERR 4
OUT 5
ERR 6
OUT 7
ERR 8
OUT 9
ERR 10
ERR 12
OUT 11
ERR 14
OUT 13
OUT 15
ERR 16
OUT 17
ERR 18
ERR 20
OUT 19
ERR 22
OUT 21
ERR 24
OUT 23
OUT 25
OUT 27
ERR 26
OUT 29
ERR 28
OUT 31
OUT 33
OUT 35
ERR 30
ERR 32
ERR 34
OUT 37
ERR 36
ERR 38
OUT 39
OUT 41
ERR 40
ERR 42
OUT 43
ERR 44
OUT 45
ERR 46
ERR 48
OUT 47
OUT 49
ERR 50
OUT 51
OUT 53
OUT 55
OUT 57
OUT 59
ERR 52
ERR 54
OUT 61
ERR 56
OUT 63
OUT 65
ERR 58
OUT 67
ERR 60
OUT 69
OUT 71
ERR 62
ERR 64
OUT 73
OUT 75
ERR 66
OUT 77
OUT 79
ERR 68
ERR 70
OUT 81
ERR 72
OUT 83
OUT 85
ERR 74
OUT 87
ERR 76
OUT 89
ERR 78
ERR 80
OUT 91
ERR 82
OUT 93
ERR 84
OUT 95
ERR 86
OUT 97
ERR 88
ERR 90
ERR 92
OUT 99
ERR 94
ERR 96
ERR 98
ERR 100
The following CliWrap code does maintain order:
var sb = new StringBuilder();
using var fs = File.Create("out.txt");
using var semaphore = new SemaphoreSlim(1, 1);
var target = PipeTarget.ToDelegate(async line =>
{
await semaphore.WaitAsync();
try
{
Console.WriteLine(line);
sb.AppendLine(line);
await fs.WriteAsync(Encoding.UTF8.GetBytes(line));
}
finally
{
semaphore.Release();
}
});
await (Cli.Wrap("test.bat") | (target, target))
.ExecuteAsync();
Output
OUT 1
ERR 2
OUT 3
ERR 4
OUT 5
ERR 6
OUT 7
ERR 8
OUT 9
ERR 10
OUT 11
ERR 12
OUT 13
ERR 14
OUT 15
ERR 16
OUT 17
ERR 18
OUT 19
ERR 20
OUT 21
ERR 22
OUT 23
ERR 24
OUT 25
ERR 26
OUT 27
ERR 28
OUT 29
ERR 30
OUT 31
ERR 32
OUT 33
ERR 34
OUT 35
ERR 36
OUT 37
ERR 38
OUT 39
ERR 40
OUT 41
ERR 42
OUT 43
ERR 44
OUT 45
ERR 46
OUT 47
ERR 48
OUT 49
ERR 50
OUT 51
ERR 52
OUT 53
ERR 54
OUT 55
ERR 56
OUT 57
ERR 58
OUT 59
ERR 60
OUT 61
ERR 62
OUT 63
ERR 64
OUT 65
ERR 66
OUT 67
ERR 68
OUT 69
ERR 70
OUT 71
ERR 72
OUT 73
ERR 74
OUT 75
ERR 76
OUT 77
ERR 78
OUT 79
ERR 80
OUT 81
ERR 82
OUT 83
ERR 84
OUT 85
ERR 86
OUT 87
ERR 88
OUT 89
ERR 90
OUT 91
ERR 92
OUT 93
ERR 94
OUT 95
ERR 96
OUT 97
ERR 98
OUT 99
ERR 100
I'm not sure how cmd
or PowerShell do it on their end, but I assume they have a way to differentiate between discrete write operations, which allows them to synchronize them even if the outputs are not separated by line breaks (or any other predefined character sequence).
On CliWrap's side, stdout and stderr are just homogenous binary streams, so it's impossible (to my knowldge) to reliably identify individual writes. The ToDelegate(...)
approach above identifies them by line breaks.
For completeness’s sake, I tested it out with Process
too and it also doesn't ensure the order of writes:
using var process = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = "test.bat",
RedirectStandardOutput = true,
RedirectStandardError = true
},
EnableRaisingEvents = true
};
process.OutputDataReceived += (sender, args) =>
{
if (args.Data is not null)
{
Console.WriteLine(args.Data);
}
};
process.ErrorDataReceived += (sender, args) =>
{
if (args.Data is not null)
{
Console.WriteLine(args.Data);
}
};
process.Start();
process.BeginOutputReadLine();
process.BeginErrorReadLine();
process.WaitForExit();
Unfortunately, it seems that with a larger sequence of writes, even my second method doesn't work. I think it may be impossible to achieve this directly.
However, one trick you can do is wrap the shell and merge stdout and stderr:
Cli.Wrap("cmd").WithArguments(new[] {"/c", "test.bat 2>&1"})
Then you can process the output normally in CliWrap, since it's now just one stream:
var sb = new StringBuilder();
using var fs = File.Create("out.txt");
var target = PipeTarget.Merge(
PipeTarget.ToStream(Console.OpenStandardOutput()),
PipeTarget.ToStringBuilder(sb),
PipeTarget.ToStream(fs)
);
await (Cli.Wrap("cmd").WithArguments(new[] {"/c", "test.bat 2>&1"}) | target)
.ExecuteAsync();
Sorry i didn't reply for weeks, i literally had the tab open but too many things to do..
You mean to say it's totally impossible (as far as you understand) to maintain output order with c#? Or with any .NET language maybe? At which level do things break in your opinion?
I've always had problems myself with this thing in c# and always assumed i was doing something wrong since i'm not a c# dev.
But if it's totally impossible it might be useful to report it to someone (to whom, i don't know, it depends)
tyfyt
You mean to say it's totally impossible (as far as you understand) to maintain output order with c#? Or with any .NET language maybe? At which level do things break in your opinion?
It's definitely impossible using .NET's Process
class (which CliWrap is based on), from my understanding. If you make your own native calls to create and run a process, then it should be theoretically possible on Windows, I believe. I don't know about other operating systems, though.
This is not specific to C# however, as you can find people facing the same challenges in other languages and platforms:
- https://stackoverflow.com/a/57119188/2205454
- https://stackoverflow.com/a/4959696/2205454
- https://unix.stackexchange.com/a/701896
- https://stackoverflow.com/a/60232612/2205454
Many of the answers above reference "buffering" as being the cause. That's probably true at the system level, but even if the client consistently flushed data after every write, the order of data could still get mangled by the layers of indirection that it has to go through before it reaches your code. In .NET, for example, you can theoretically do a Task.WhenAny(...)
loop on stdout/stderr reads, but that would only preserve order assuming it's guaranteed in all the underlying stream plumbing (which it probably is not).
I think the main barrier is that the standard streams were not meant to be synchronized from a conceptual standpoint, so any successful attempts to achieve that may just be coincidental.
Yes i saw a few of those exact threads you linked.. i did find some solution for linux a long time ago, can't remember what exactly, that altered or even disabled the console's buffering and allowed for ordered output. I can't remember if i even tested it though, as i focus mainly on Windows atm.
Still, there has to be a way to like, "hook deeper" into the system's calls, right below where said buffering occurs.. but how? how do cmd/pwsh do it for example? but this is more a question for some stack* website i think....
It's bittersweet to know that me being a c# newbie wasn't the problem :\