Option to redirect stdout/stderr to user shell?
sergei-mironov opened this issue · comments
Hi. I'm using codebraid to run machine-learning experiments which sometimes take long time to complete and tend to show lots of rich debug output like gaudges and progress bars (wget
tool prints similar thing).
Currently, I put long-running code in separate blocks and hide rich output by adding {hide=stdout+stderr}
. Ideally, it would be nice to include only the final state of progress bars, but I think it is not that important. More useful would be to bypass certain output to the client shell as-is to be able to monitor the progress of experiment.
Could you please consider adding an option which would redirect stout/stderr of a block into the user shell when enabled?
It would be possible to redirect stdout and/or stderr to the shell, but then nothing from stdout/stderr would appear in the document (or, depending on implementation, all of it would, including intermediate progress bar states). Would that do what you want, or do you want some but not all stdout/stderr output in the document? I have plans that could enable selective output eventually, but I expect that would be somewhat complex.
It would be possible to redirect stdout and/or stderr to the shell, but then nothing from stdout/stderr would appear in the document (or, depending on implementation, all of it would, including intermediate progress bar states). Would that do what you want .. ?
Yes, I mean an option to forward the code's stdout/stderr to the stdout/stderr of codebraid executable. Otherwise we face a long pause which is very misleading. Here is an example of what I'm speaking about: Report source is here and it's HTML output is here. Note, the way the codebraid rendered the progress-bar, which was originally issued by tf.keras.model.fit function. Of cause, we may use an option to suppress verbose output but in reality this fit
function may take hours to complete, so some information on the process would be very helpful.
I have plans that could enable selective output eventually, but I expect that would be somewhat complex.
That would be great. I think that the key question here is how to define a syntax for filtering rules. Please consider checking it against progress bars like in the above example.
Other improvement I thought of - is to handle special characters like 'return cursor without making newline'. Handling such signals in the above example would result in rendering only the final state of progress bar.
Actually, in a perfect world we probably want all three things. I agree that it could be tedious to implement so If one ask me to choose I would pick just stdout forwarding to stay informed..
For reference, here is an example of my current approach and it's output: I use caching to prevent fitting models during report generation.
+1 This would be immensely useful. I am trying to do something similar (which, incidentally, is also related to ml 😉) and would love to have live access to the executing code blocks' stdout in order to detect compilation errors right away.
I usually use codebraid to generate plots and save them to a file inside a codeblock after which I am able to include the image into my document using regular markdown. Hence I always have to use hide=stdout+stderr
to prevent the output from cluttering the document which has the significant downside that I receive no feedback about errors during compilation. I have to infer from the document's contents that something went awry.
In my opinion it is unnecessary to merge the stdout of codeblocks with that of codebraid itself if such an attempt will be problematic to implement. As you have to buffer every codeblock's stdout/stderr anyway why not simply dump it into a file? Let codebraid --stdout-dump="_codebraid/stdout.log" --sterr-dump="_codebraid/stderr.log" ...
append all output to the logfiles with which the users of codebraid are free to do whatever they desire, such as inspect it after a compilation. As we don't want 10K writes to a physical file per training run of a machine learning model the buffering scheme should be configurable via a flag. codebraid --dump-scheme=block ..
might compile and dump the stdout on a per-codeblock basis, while codebraid --dump-scheme=line
should write out after each line of produced stdout which would allow us to use a named pipe instead of a physical file and tail -f
onto it, which is almost exactly the same as having direct access to the codeblocks stdout (progress bars and spinners won't work which I can live with).
I'm sorry if what I'm suggesting sounds outlandish. Using codebraid has been so comfy from the very start that I never bothered to have a look under the hood :-) I'll catch up on that eventually! Thank's a ton for your great work. Codebraid is an absolute blessing.
In the last commit on GitHub, I've added a live_output
option. You can add live_output=true
to the first code chunk in a session, and will then get stdout/stderr in the terminal live during code execution, along with information about where in the document the current output is coming from.
For things like progress bars, this will depend on output flushing stdout/stderr or perhaps using line buffering to get good results. @grwlf @slavistan if one of you can try the dev version from GitHub with one of your documents, that would be helpful in making sure everything is working. You may need to try adding executable="python3 -u"
(or something similar) to the first code chunk of a session to use Python in line-buffered mode if progress bars etc. don't work smoothly. Some additional details about live_output
are in the README.
There currently isn't any support for Jupyter kernels, just for the built-in code execution system. I think Jupyter kernels should be possible once everything is working well for the built-in system.
At some point, I will also look into adding a command-line option, maybe --live-output
, so that this can be turned on without modifying a document. I will also see about filtering output for \r
so that progress bars etc. only appear in a document in their final, complete form.
Cool! It works perfectly for bash code blocks whereas Python code blocks indeed require the executable "python3 -u"
parameter or, alternatively, print(..., flush=True)
.
For anyone else interested in a quick test source this
# install codebraid from github
sudo pip3 uninstall -y codebraid # or pip3 --user
sudo pip3 install git+https://github.com/gpoore/codebraid
# create the markdown
cd $(mktemp -d)
printf '
```{.python .cb.run executable="python3 -u" session=test_py live_output=true}
import time
for ii in range(5):
print(ii+1, "/5", sep="")
time.sleep(1)
```
' > markdown.md
# compile with codebraid
codebraid pandoc markdown.md --to html5 -o output.html --overwrite
I'll keep tinkering around with it and will return with feedback. Thanks a lot!
I believe all features for this are now implemented, so I'm closing the issue.
live_output=true
can be set for an individual session as part of the attributes for the first code chunk in the session. And live output can be enabled as the default for an entire document with the command-line option --live-output
. Output works with both the built-in code execution system and Jupyter kernels.
A new release including all these features will be on PyPI soon.