Calysto / metakernel

Jupyter/IPython Kernel Tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error method is noncompliant to Jupyter protocol

JMurph2015 opened this issue · comments

I posted this issue on the octave_kernel repo, but after digging into the source, I discovered it may be generic to all kernels based on Metakernel.
So here's the issue description again:
So when a kernel errors out, you should be sending back an execute-reply like this:

{
    #...
    'content':{
        'status':'error',
        'ename':<your_error_name>,
        'evalue':<your_error_data>,
        'traceback':<your_traceback>
    }
    #...
}

That way Jupyter can process that as a proper error and not just assume it is any old STDOUT.
Right now it would be a the easiest (?) fix to just rename what you are already returning to 'traceback' and insert some dummy values for the fields that you can't get easily.

It would be really awesome if this were fixed because it's breaking my use case!

Thanks for the report! Can you give a bit more detail on your suggested fix? What do you mean by renaming "what you are already returning"? Any chance of actually making a PR?

See also fixes and discussion here: #147

Oops, I saw that #147 was actually closed, but please take a look at the comments there.

TL;DR Metakernel needs to either modify pexpect some more, or use the fd oriented class of pexpects. This is really really breaking to me, pls help.

Ok, let's try to nail down exactly what the issue is, and what should be done. Are you saying that if a kernel prints on stderr, should that be interpreted and returned as an error reply?

This is a fundamental limitation of using pexpect: pty does not distinguish between stderr and stdout. Using raw fds is a road fraught with peril due to locking. I don't see a way to support this use case.

That's the idea. At least add that option to Metakernel and flag octave_kernel to use that option. I can't exactly ensure that if any kernel prints to stderr, that it will always be a halting error, but octave at the very least will print r".+ error: .+" if there is a halting error. (sorry that regex may be a little wrong).

And it's not a use case. It's a fundamental feature of being a Jupyter kernel. (that also happens to be my use case.)

I understand, but it is also a fundamental limitation of talking to a spawned process from python. The bash kernel has the same limitation.

It seems like we could at least make any pexpect kernel print out a special identifier that would signal that an error has occurred, right?

Can Pexpect be modified to distinguish between the two (at least optionally)?

I've thought about some kind of API that would allow pexpect send back image data direct through some kind of mark-up codes. Seems like the same could be done to signal an error.

And I guess the problem is that Metakernel at the moment can't even distinguish if there was an error because it's using Pexpect that presently doesn't give information about which stream some info came from. If it did that, I think the proper place for the rest of the necessary modifications would be in Metakernel.

@dsblank, it could be possible to thread something like that through. @JMurph2015, the pty module in core python is the one with the limitation.

hmmm, but Popen processes can have their streams distinguished iirc. Of course you might not be remotely based on it, but how do they do it?

Yes, it is easy to read it after the fact. Reading from both stderr and stdout incrementally is the problem.

Also, without a pty you can't interrupt the Octave process, so you'd be losing that capability of the kernel as well.

So this may be a python core issue (assuming pty is an absolute dependency) but there shouldn't be a super good reason something generically communicating with another process shouldn't expose all streams (since they are all just file descriptors anyway)

/ the cpython pty module is like 170 lines long. It wouldn't be incredibly difficult for someone (that someone might be me today) modify it and make a new module, perhaps superpty (or maybe pity 😆 ) that supports segregating stdout and stderr.
Edit: typo
Update: Would anyone actually use said modifications? Because if Pexpect doesn't at least optionally enable it, then it's kinda pointless for me to make the module.

Hi!
Bump on that question of whether or not the stack could pull support down from a hypothetical segregated fd pty module.

That would mean that pexpect would have to depend on an extension module, or we'd also have to replicate much of what pexpect is doing and depend on an extension module. I am 👎 on the idea.

It might perhaps depend on how much code we'd have to rely on that was outside of the standard libraries. But like @blink1073 suggests, it would not be good to fork a big chunk of code. Can be done by subclassing? Is it smallish in size? Is it something that python standard library would want to include?

200 lines of code presently. It's unclear how much of Pexpect would need refactoring. The options I put in are entirely optional arguments, but to actually use segregated streams would definitely change at least a few things on Pexpect's side.

The pty core module itself is relatively tiny at 170 lines total because it is at could be described as some convenience functions that just garble the stdin, stdout, and stderr file descriptors together.

I'd say make a PR and let's check it out.

@blink1073 what are the odds something like a ptyplus module could find a home in pexpect, which would do this segregated reading so that any code pexpect depends on is either in base or in pexpect?

/ sorry for the continued discussion here, but I'm still formulating who, what, and where these changes are going to be made (best to go in with a plan, right?). I think I can even enable pexpect's standard "read all" mode by using some trickery with piping things written onto a separate stderr to a "unified" fd that combines the three just like old times. The tricky part then is say you tell pexpect to read until EOF on the "unified" fd, then how does one know how much of that came from the stderr fd and how much of that came from the stdout fd. So I'm about halfway to a comprehensive solution. Also there are other ways the reading problem could be addressed, like reading each incrementally and waiting for both to hit EOF or some combination of conditions.

Or you could have it read from the unified stream and check if each chunk it reads is the next thing on the top of a stream-specific fd.
edit: clarity

pexpect would have to be refactored to add an optional handling of stderr if that enhanced module were available. I'm not sure how it would interact with the .expect() method. We'd be expecting a prompt, but would need to handle an out-of-band stderr and potentially bail.

@JMurph2015 Are you sure that this is the right approach? What about a pexpect-based kernel that doesn't write to stderr? It still needs a method to signal an error. I'm still wondering about a special text signal that could signal an error, and even an API for routing text to stderr in a jupyter frontend, even if it didn't come from stderr in the external process.

@dsblank, I like the idea, but I just don't know how one would be able to get that inserted into a runtime's error processing loop. For example, how would you be able to direct Octave that it should print some special characters every time it hits an error so that Metakernel and/or Octave kernel can pick that up?

Or, what happens if the REPL segfaults and Octave stops altogether? ( this one may be easier because Pexpect should know if the process crashes altogether.)

Octave prints error: or syntax error: at the start of the line for errors. Right, we can catch a repl segfault.

@blink1073 Not to get too idealist here, but you technically lose some functionality there because you would then prohibit the first thing a user prints from being something matching the regex
".* error: ($s).*".

That's been exactly my workaround for my project so far, but my whole team agrees that it isn't great to have to tell users "Specific kernels don't process errors out of band, so please refer to our compatibility table and follow those rules whenever using those kernels"

edit: readability

I had a pretty decent idea just a few minutes ago that might help us all out. If we called the repl in question with a bash pipe that padded stderr with somewhat long sequences of unprintable characters (that we know), then we could just regex on those sequences later and if they were long, uncommon, and unprintable, then it's extremely unlikely to be something a user cared about seeing, and I can feel confident in calling it an actual error.

That sounds reasonable.

@JMurph2015 Yes, I was thinking along those lines. If we can make it an API (of sorts) that would be very useful in other situations.

So the train has kinda left the station for me to do this on my internship time, but I can probably hack around a little and talk about what this should look like on my free time. Maybe we should make an empty PR or something to talk out how this should look?