Getting non-blocking live output for `stdout` and `stderr`

Question

Getting non-blocking live output for `stdout` and `stderr`

brendan-simon-indt opened this issue 2 years ago · comments

brendan-simon-indt commented 2 years ago

Is there a way to get live output for both stdout and stderr at the same time?

If there is a long running process that outputs to stdout with some errors on stderr interspersed, how can I read them in real-time (so that I can display both stdout and stderr as they happen.

Orsiris de Jong · Answer 1 · Wed May 18 2022 18:03:03 GMT+0800 (China Standard Time)

So far, in my tests on windows, I get both stdout and stderr output when using live_output=True, tested.

Test schema:
Create test.ps1 file containing:

Write-Output "BEGIN"
sleep 1
Write-Output "1SEC"
sleep 1
Write-Error "2SEC ERROR"
sleep 1
Write-Output "3SEC"
sleep 1
Write-Output "END"

Create test.py file containing

cmd=r"C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe C:\GIT\command_runner\command_runner\test.ps1"
exit_code, output = command_runner(cmd, shell=True, live_output=True)
print("SCRIPT FINISHED. OUTPUT WAS:")
print(output)

Output (where the first lines before SCRIPT FINISHED appear each second):

DEBUT
1SEC
C:\GIT\command_runner\command_runner\test.ps1 : 2SEC ERROR
Au caractère Ligne:1 : 1
+ C:\GIT\command_runner\command_runner\test.ps1
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,test.ps1
 
3SEC
END
OUTPUT
DEBUT
1SEC
C:\GIT\command_runner\command_runner\test.ps1 : 2SEC ERROR
Au caractère Ligne:1 : 1
+ C:\GIT\command_runner\command_runner\test.ps1
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,test.ps1
 
3SEC
END

Could you come up with a test case that produces stdout and some stderr output so I can write some tests please ?
Perhaps your env is Linux, which should work just like Windows. A test script would be useful.

Orsiris de Jong · Answer 2 · Wed May 18 2022 18:30:14 GMT+0800 (China Standard Time)

Done the same test under Linux, works too as expected:

File test.sh containing:

echoerr() { echo "$@" 1>&2; }

echo "BEGIN"
sleep 1
echo "1SEC"
sleep 1
echoerr "2SEC ERR"
sleep 1
echo "3SEC"
sleep 1
echo "END"

File test.py containing:

from command_runner import command_runner

cmd="/usr/bin/bash test.sh"
exit_code, output = command_runner(cmd, live_output=True)
print("SCRIPT FINISHED, OUTPUT WAS:")
print(output)

Output (where the first lines before SCRIPT FINISHED appear each second):

BEGIN
1SEC
2SEC ERR
3SEC
END
SCRIPT FINISHED, OUTPUT WAS:
BEGIN
1SEC
2SEC ERR
3SEC
END

I need a test case for what you're trying to achieve.
Both my tests where done with v1.3.1.

Brendan Simon · Answer 3 · Thu May 19 2022 21:45:32 GMT+0800 (China Standard Time)

Hi. Ok I will try out your module and give it a go.

My use case is a Windows app communicating to some remote Linux boxes via ssh (so the windows app will call ssh or rsync and report back output).

My plan was to parse the live output so I can update a GUI progress bar, etc, for feedback on the long running command.

Brendan Simon · Answer 4 · Thu May 19 2022 21:53:57 GMT+0800 (China Standard Time)

Looks like the live_output=True option just echos output to stdout and stderr.
I don't think my python app can capture that for partial processing.
Is there a callback option that can be passed to command_runner?
That way my app can process partial data in real-time.

Orsiris de Jong · Answer 5 · Thu May 19 2022 22:31:54 GMT+0800 (China Standard Time)

That's what live_output=True is supposed to do, print output while executing.

What you're searching is to get output back to your program before execution ends.
There are multiple ways to achieve this:

Specify a file for stdout and stderr and read from them

ex: command_runner(mycmd, stdout=r"C:\somepath\stdout.log")
While it's the easiest way, you'll have to deal with reading the file on modifications.

Use a Queue()

ex:

output_queue = queue.Queue()
command_runner(mycmd, stdout=output_queue, stderr=output_queue)

# Read from queue
while True:
    try:
        line = output_queue.get(timeout=.1)
    except queue.Empty:
        pass
    else:
        if line is None:
            break
        else:
            # your code using "line" variable

Use a callback function
ex:

def my_func(string):
    # your code
command_runner(mycmd, stdout=my_func, stderr=my_func)

Easy way, but you might get garbage (like half of an output line or so) depending on what subprocess returns.

Solutions 1 and 2 require you to thread command_runner, since it would be blocking.
Solution 3 doesn't require threading, but you'll have to handle a buffer to reconstruct partial strings.
Also, solution 3 may inhibit timeout argument unreliable if your callback function blocks, unless you thread it too.

Anyway, and since you're building a GUI, I guess you already use threads in order to avoid blocking UIs.
All three solutions need to be implemented, and are fairly easy to code, and not even mutual exclusive.
Which one would you like to check out ?

As a sidenote, I've coded a couple of threaded GUIs for Windows. what GUI lib are you using ?

Brendan Simon · Answer 6 · Fri May 20 2022 07:11:16 GMT+0800 (China Standard Time)

Awesome! Lots of options.

If subprocess.Popen() is called in text mode (i.e. text=True or universal_newlines=True), would stdout and stderr be line buffered?

I'm using wxPython (latest 4.1.2a snapshot - latest working one) as my GUI library.

I haven't addressed the threading/blocking issue yet. It's a new app and I'm just experiment with the low level technologies for communications using ssh and rsync. I think I'm good with those now, but now looking at the best way to run them (e.g. I was using subprocess.run() directly, then moved to subprocess.Popen() to get more control over real-time output, and then I found code_runner)

However, I do need to sort out the blocking issue, as I intend to run multiple commands simultaneously - e.g. rsync multiple files/directories to multiple Linux boxes. There might be multiple dialogs for output feedback or interaction?

wxPython does have an API (producer and consumer) for long running actions (using threads).

I've also used wxasync (provides asyncio support for wxPython). The asyncio routines still have to play nice and yield though.

Initially I was wondering if just running the commands as part of a new dialog would be enough (dialogs have their own event loop apparently - at least wxPython does). I haven't trialed that yet, but I suspect a blocking dialog might block other GUI event loops.

Otherwise just running the command in a thread (python threading or wxPython producer/consumer) might be simpler.

Orsiris de Jong · Answer 7 · Fri May 20 2022 19:00:53 GMT+0800 (China Standard Time)

Yes, subprocess.Popen would be line buffered, but you'll have lots of trouble when switching Python versions. Had enough subprocess trouble that pushed me developping an overlay.
I actually developped command_runner in order to avoid subprocess compatibility issues (some versions miss timeout, others don't decode text properly).
But initially, it was developped since subprocess timeout argument doesn't work when launching windows GUI apps, because windows stream.readline() implementation is blocking.

There are some reasons why you would want to keep control over your threads instead of letting a GUI lib decice, since you definitly want to be able to shutdown rsync properly when someone closes your GUI or your progress bar.

So far, I think the best route you could go would be the Queue route.
I can implement that function fairly quickly in command_runner if you're interested.

To implement command_runner as thread, you could do something like:`

cmd_thread = threading.Thread(
                target=command_runner, args=(cmd)
            )
            cmd_thread.daemon = True  # thread dies with the program
            cmd.thread.start()

In order to stop command_runner thread properly, we could add an argument that executes a function in order to know if we still need to run. ex:

def my_gui_process_is_running():
    return True if whatever_condition_you_are_checking else False

Then run command_runner with argument stop_on=my_gui_process_is_running()
Again, I can implement this fairly quickly if you want and could help you out using it properly.

Btw, having written alot of Windows GUI apps, you should definitly checkout PySimpleGUI.
PySimpleGUI is a framework that can use Tk, QT, or Wx for both Windows and Linux.
I found it to be the easiest way to achieve a full blown Windows GUI, with progress bars, graphs, controls, etc.

Let me know if I can help you out here.
I am definitly interested in improving command_runner to become a de-facto easy to implement substitute for subprocess.Popen, of which it already accepts all arguments.

Brendan Simon · Answer 8 · Fri May 20 2022 22:57:26 GMT+0800 (China Standard Time)

Thanks for all that. I'll stick with wxPython for now, and might investigate PSG when I have more time. It looks interesting, but I fear it might not have some of the more advanced widgets that wxPython has (e.g. TreeListCtrl, etc).

I thought that the Queue, File and Callback feature were already implemented. Did I misinterpret your previous response?

I'm happy to try out the queue option but the callback option also appeals to me.

Orsiris de Jong · Answer 9 · Sun May 22 2022 07:15:08 GMT+0800 (China Standard Time)

Indeed, you did misinterpret, I said I could easily implement those.
That said, I did so in branch https://github.com/netinvent/command_runner/tree/generic-return-improvments
Haven't documented everything yet nor written all tests, but it should work the way I wrote above, for both callbacks and output queues.
You can specify different callbacks/queues for stdout or stderr, or just specify one type for stdout so stderr gets redirected to stdout.
I also wrote the stop_on part which can be fairly useful for GUI.

Feel free to comment if you have trouble using it.

brendan-simon-indt · Answer 10 · Sun May 22 2022 20:10:04 GMT+0800 (China Standard Time)

I'm having a look at it now.

One not regard the changes in README.md

Example: `command_runner(cmd, min_resolution=0.2)`

I don't like the min_resolution name. I think interval is better (or maybe check_interval if you want to be more verbose/explicit).

brendan-simon-indt · Answer 11 · Sun May 22 2022 20:12:34 GMT+0800 (China Standard Time)

It would also be cool if there were options to code_runner to create and start a thread for the call (e.g. using threading or multiprocess or even ascyncio - but maybe that introduces too many depenedencies?)

Orsiris de Jong · Answer 12 · Mon May 23 2022 00:30:35 GMT+0800 (China Standard Time)

I don't like the min_resolution name. I think interval is better (or maybe check_interval if you want to be more verbose/explicit).

Indeed, min_resolution was the internal name before becoming an argument. check_interval sounds pretty good to me. [EDIT] Changed in branch generic-return-improvements[/EDIT]

It would also be cool if there were options to code_runner to create and start a thread for the call (e.g. using threading or multiprocess or even ascyncio - but maybe that introduces too many depenedencies?)

I don't really see what functionnality you are seeking here.

Internally, there are already threads to handle live output and timeouts.
Using asyncio won't work because subprocess isn't written that way, and IMO it won't.
Using multiprocessing isn't an option because the whole script needs to be written in order to allow multiprocess execution. Hence I leave that up to the command_runner user.
Using threading will let you keep control over the program, but it won't allow multiple CPU core usage.

Basically, when I want to thread command_runner, I use a function decorator that threads execution in order to keep control in my program.

My threading decorator:

from threading import Thread
from concurrent.futures import Future
from functools import wraps


def call_with_future(fn, future, args, kwargs):
    """
    Threading a function with return info using Future
    from https://stackoverflow.com/a/19846691/2635443

    """
    try:
        result = fn(*args, **kwargs)
        future.set_result(result)
    except Exception as exc:
        future.set_exception(exc)


def threaded(fn):
    """
    @threaded wrapper in order to thread any function

    @wraps decorator sole purpose is for function.__name__ to be the real function
    instead of 'wrapper'

    """

    @wraps(fn)
    def wrapper(*args, **kwargs):
        future = Future()
        Thread(target=call_with_future, args=(fn, future, args, kwargs)).start()
        return future

    return wrapper

Then I can launch command_runner threaded like:

@threaded
thread_result= command_runner(cmd)
# MY CODE HERE CONTINUES SINCE command_runner function is threaded

while not thread_result.done():
    sleep(1)
# EXPLOIT RESULT SINCE IT'S DONE
exit_code, output = thread_result.result()

Orsiris de Jong · Answer 13 · Mon May 23 2022 06:27:09 GMT+0800 (China Standard Time)

I think I now got what you're looking for.
I ended up adding my threading code to command_runner (baked it directly in so I don't get more dependencies, original lib I use: https://github.com/netinvent/ofunctions/blob/master/ofunctions/threading/__init__.py)

Updated README.md, hopefully readable.
Added unit tests for callback and queue readings.
Drank coffee.

I think what you're searching for would look like the code below:

import queue
from time import sleep
from command_runner import command_runner_threaded

output_queue = queue.Queue()
# Launch command_runner as thread that will return a concurrent.future result after execution
thread_result = command_runner_threaded('ping 127.0.0.1', shell=True, method='poller', stdout=output_queue)

# Now read the queue given to stdout until execution ends
read_queue = True
while read_queue:
    if thread_result.done():
        read_queue = False
    try:
        line = output_queue.get(timeout=0.1)
    except queue.Empty:
        pass
    else:
        if line is None:
            break
        else:
            # ADD YOUR LIVE CODE HERE TO DEAL WITH RSYNC OUTPUT
            # basic rsync regex example
            # try:
            #     result = re.search(r"(.*)xfer(.*)", line):
            #     print(result.group(1), result.group(2))
            # except AttributeError:
            #     pass

# Now we may get exit_code and full output since result has become available at this point
exit_code, output = thread_result.result()

Does this fit your needs?
Of course if you want to read stdout and stderr separately, you'll have to specify another queue for stderr and read that one too.

brendan-simon-indt · Answer 14 · Mon May 23 2022 11:00:05 GMT+0800 (China Standard Time)

That seems to be working and I am using separate queues for stdout and stderr.

I don't seem to get a None object when reading stderr though, as is the case with stdout

I am detecting the end of the read process by checking when both queues return None.

Orsiris de Jong · Answer 15 · Mon May 23 2022 22:28:40 GMT+0800 (China Standard Time)

Got your case replicated. This is what happens when you code at 1am...
I've fixed the part where read loop wasn't waiting for stderr queue to end creating a race condition.
Also added tests so this case is covered now.

Have a look at tests/test_command_runner.py function test_double_queue_threaded_stop() to see what queue read implementation I use for both stdout and stderr.

Orsiris de Jong · Answer 16 · Thu May 26 2022 23:26:49 GMT+0800 (China Standard Time)

I've merged the branch with the modifications above today into master.
Got a lot of trouble getting python 2.7 and pypy compatibility with the modifications I made, but everything worked out well.
Did you succeed using command_runner for your project ? is it working for you ?
PS: updated the examples with easier code

brendan-simon-indt · Answer 17 · Fri May 27 2022 06:58:25 GMT+0800 (China Standard Time)

Hi Deejan,
I had to park it for a little bit. I hope to get back on to it today or over the weekend. I am definitely keen to use it and think it will do the job nicely :)
I will let you know if I hit any roadblocks or have any questions or suggestions.

Brendan Simon · Answer 18 · Mon May 30 2022 19:40:57 GMT+0800 (China Standard Time)

Hi Deejan.
I did some experiments with command_runner and command_runner_threaded, and here are my observations.

I have a read_task, which I run in a thread. I then use command_runner() (within a manually created thread) to run a command that outputs to stdout and stderr. The read_task now completes when both stdout_queue and stderr_queue return None :)

I repeat the above test using command_runner_threaded() (instead of manually created thread) and the read_task does not complete. The stdout_queue returns None, but the stderr_queue does not.

Hope that makes sense.

Orsiris de Jong · Answer 19 · Mon May 30 2022 19:59:21 GMT+0800 (China Standard Time)

I just redid all my tests, and noticed that there was a typo in the README.md example I made.
Fixed in 2081b1f

Also, I just released v1.4.0, with alot more improvements and tests.
Please update your code to current release and, if you copied the code from README.md, please fix the typo ;)

I just made the test again with command_runner_threaded. It works well for me, stderr_queue returns None.

brendan-simon-indt · Answer 20 · Mon May 30 2022 20:44:32 GMT+0800 (China Standard Time)

Retested with v1.4.0 and same result.

The problem is that my flow is slightly different to your example.

start read thread
call command_runner_threaded() with my command
call exit_code, output = thread_result.result()

The thread_result.result() call blocks and the read thread doesn't execute all (not until the command terminates).

I tried putting a read_thread.join() before thread_result.result() but a similar thing happens.

It seems the read thread will not run. I think the issue is that this happening in a GUI button click event handler and the thread only kicks off when the handler completes (so no looping within the button event handler).

Orsiris de Jong · Answer 21 · Mon May 30 2022 21:14:42 GMT+0800 (China Standard Time)

I think you misunderstood the threading function.
Calling command_runner_threaded will give you back a result which can't be used until the thread has finished. In the meantime, you get live output via stdout/stderr queues.

The call to thread_result.result() should only be done once the read queue has ended, or else it will block until command runner thread is finished, since it cannot compute exit_code before.

Actually, you should not have a read thread at all, but a read queue loop, and call thread_result.result() after read queue is finished. Since your read queue gets stdout and stderr stream live, calling thread_result.result() only adds the exit code.

If you really need a separate read thread, you should call thread_result.result() only once read_thread.is_alive() is False so your program will not block.

If you have a git repo, I'll happily have a look into your code.

brendan-simon-indt · Answer 22 · Mon May 30 2022 21:24:08 GMT+0800 (China Standard Time)

Yes, understood. I do need a thread because it will be long running task (to upload files to a remote box and then execute the file to perform some desired functionality). The idea that either a dialog box will appear to provide feedback (or a dashboard updated) but the main GUI still needs to be responsive (e.g. to initiate more transactions with various other boxes).

One similar type of example would be Windows explorer doing a large file copy to a remote server. A dialog appears for transfer feedback, but Windows Explorer is still active and other file transfers can be initiated, etc.

No repo. I still need to update the GUI in the main GUI thread, so I'm thinking of using the GUI idle handler or timer event handler. Another option would be to use wxasync and use asyncio versions of queue.

brendan-simon-indt · Answer 23 · Mon May 30 2022 21:28:11 GMT+0800 (China Standard Time)

Does command_runner support asyncio.Queue ?

brendan-simon-indt · Answer 24 · Mon May 30 2022 21:30:14 GMT+0800 (China Standard Time)

Actually I don't think asyncio would work in my case, as I have to run some windows binaries (e.g. rsync) and they wont play nice with asyncio (i.e. they will block), so threads it is.

Orsiris de Jong · Answer 25 · Mon May 30 2022 22:10:51 GMT+0800 (China Standard Time)

command_runner would not support asyncio because underlying subprocess doesn't.

If you want your GUI to stay responsive, I'd use your read thread to update the GUI with stdout / stderr output that it receives from the queue given to command_runner_threaded.

Once the read thread is done, use thread_result.result() to get exit_code and full output for logs / success / error messages.

I have done all the test cases on linux and windows, from python 2.7 to python 3.10 and pypy, with success, so I decided to release the version including the improvements you asked.
Feel free to ask other improvements, but I do think that the current version will handle your scenario quite well.

brendan-simon-indt · Answer 26 · Tue May 31 2022 07:23:15 GMT+0800 (China Standard Time)

Yes, I think command_runner will work well for me and will be using it. I'll let you know how it goes.

I can't directly update the GUI from the read thread, because the GUI will crash (eventually). GUI must be updated from the GUI thread so either need another thread communications mechanism (e.g. another queue) or some other way to place data into the GUI event loop (which all seems like double handling). At the moment I am planning to to put the read queue(s) code into the GUI Idle or Timer event handlers.

Orsiris de Jong · Answer 27 · Fri Jun 10 2022 18:10:59 GMT+0800 (China Standard Time)

Closing this issue since the enhancement is done. Feel free to reopen issue if needed.