Error method is noncompliant to Jupyter protocol

Question

Error method is noncompliant to Jupyter protocol

JMurph2015 opened this issue 7 years ago · comments

I posted this issue on the octave_kernel repo, but after digging into the source, I discovered it may be generic to all kernels based on Metakernel.
So here's the issue description again:
So when a kernel errors out, you should be sending back an execute-reply like this:

{
    #...
    'content':{
        'status':'error',
        'ename':<your_error_name>,
        'evalue':<your_error_data>,
        'traceback':<your_traceback>
    }
    #...
}

That way Jupyter can process that as a proper error and not just assume it is any old STDOUT.
Right now it would be a the easiest (?) fix to just rename what you are already returning to 'traceback' and insert some dummy values for the fields that you can't get easily.

It would be really awesome if this were fixed because it's breaking my use case!

Douglas Blank · Answer 1 · Wed Jul 12 2017 21:48:38 GMT+0800 (China Standard Time)

Thanks for the report! Can you give a bit more detail on your suggested fix? What do you mean by renaming "what you are already returning"? Any chance of actually making a PR?

Douglas Blank · Answer 2 · Wed Jul 12 2017 21:49:47 GMT+0800 (China Standard Time)

See also fixes and discussion here: #147

Joseph Murphy · Answer 3 · Thu Jul 13 2017 00:50:25 GMT+0800 (China Standard Time)

Oops, I saw that #147 was actually closed, but please take a look at the comments there.

TL;DR Metakernel needs to either modify pexpect some more, or use the fd oriented class of pexpects. This is really really breaking to me, pls help.

Douglas Blank · Answer 4 · Thu Jul 13 2017 00:54:20 GMT+0800 (China Standard Time)

Ok, let's try to nail down exactly what the issue is, and what should be done. Are you saying that if a kernel prints on stderr, should that be interpreted and returned as an error reply?

Steven Silvester · Answer 5 · Thu Jul 13 2017 01:00:07 GMT+0800 (China Standard Time)

This is a fundamental limitation of using pexpect: pty does not distinguish between stderr and stdout. Using raw fds is a road fraught with peril due to locking. I don't see a way to support this use case.

Joseph Murphy · Answer 6 · Thu Jul 13 2017 01:01:06 GMT+0800 (China Standard Time)

That's the idea. At least add that option to Metakernel and flag octave_kernel to use that option. I can't exactly ensure that if any kernel prints to stderr, that it will always be a halting error, but octave at the very least will print r".+ error: .+" if there is a halting error. (sorry that regex may be a little wrong).

Joseph Murphy · Answer 7 · Thu Jul 13 2017 01:01:48 GMT+0800 (China Standard Time)

And it's not a use case. It's a fundamental feature of being a Jupyter kernel. (that also happens to be my use case.)

Steven Silvester · Answer 8 · Thu Jul 13 2017 01:04:06 GMT+0800 (China Standard Time)

I understand, but it is also a fundamental limitation of talking to a spawned process from python. The bash kernel has the same limitation.

Douglas Blank · Answer 9 · Thu Jul 13 2017 01:07:49 GMT+0800 (China Standard Time)

It seems like we could at least make any pexpect kernel print out a special identifier that would signal that an error has occurred, right?

Joseph Murphy · Answer 10 · Thu Jul 13 2017 01:07:53 GMT+0800 (China Standard Time)

Can Pexpect be modified to distinguish between the two (at least optionally)?

Douglas Blank · Answer 11 · Thu Jul 13 2017 01:09:36 GMT+0800 (China Standard Time)

I've thought about some kind of API that would allow pexpect send back image data direct through some kind of mark-up codes. Seems like the same could be done to signal an error.

Joseph Murphy · Answer 12 · Thu Jul 13 2017 01:10:40 GMT+0800 (China Standard Time)

And I guess the problem is that Metakernel at the moment can't even distinguish if there was an error because it's using Pexpect that presently doesn't give information about which stream some info came from. If it did that, I think the proper place for the rest of the necessary modifications would be in Metakernel.

Steven Silvester · Answer 13 · Thu Jul 13 2017 01:10:46 GMT+0800 (China Standard Time)

@dsblank, it could be possible to thread something like that through. @JMurph2015, the pty module in core python is the one with the limitation.

Joseph Murphy · Answer 14 · Thu Jul 13 2017 01:11:55 GMT+0800 (China Standard Time)

hmmm, but Popen processes can have their streams distinguished iirc. Of course you might not be remotely based on it, but how do they do it?

Joseph Murphy · Answer 15 · Thu Jul 13 2017 01:12:44 GMT+0800 (China Standard Time)

https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.stderr

Steven Silvester · Answer 16 · Thu Jul 13 2017 01:13:44 GMT+0800 (China Standard Time)

Yes, it is easy to read it after the fact. Reading from both stderr and stdout incrementally is the problem.

Steven Silvester · Answer 17 · Thu Jul 13 2017 01:14:49 GMT+0800 (China Standard Time)

Also, without a pty you can't interrupt the Octave process, so you'd be losing that capability of the kernel as well.

Joseph Murphy · Answer 18 · Thu Jul 13 2017 01:19:12 GMT+0800 (China Standard Time)

So this may be a python core issue (assuming pty is an absolute dependency) but there shouldn't be a super good reason something generically communicating with another process shouldn't expose all streams (since they are all just file descriptors anyway)

Joseph Murphy · Answer 19 · Thu Jul 13 2017 01:23:20 GMT+0800 (China Standard Time)

/ the cpython pty module is like 170 lines long. It wouldn't be incredibly difficult for someone (that someone might be me today) modify it and make a new module, perhaps superpty (or maybe pity 😆 ) that supports segregating stdout and stderr.
Edit: typo
Update: Would anyone actually use said modifications? Because if Pexpect doesn't at least optionally enable it, then it's kinda pointless for me to make the module.

Joseph Murphy · Answer 20 · Thu Jul 13 2017 04:31:44 GMT+0800 (China Standard Time)

Hi!
Bump on that question of whether or not the stack could pull support down from a hypothetical segregated fd pty module.

Steven Silvester · Answer 21 · Thu Jul 13 2017 04:33:51 GMT+0800 (China Standard Time)

That would mean that pexpect would have to depend on an extension module, or we'd also have to replicate much of what pexpect is doing and depend on an extension module. I am 👎 on the idea.

Douglas Blank · Answer 22 · Thu Jul 13 2017 04:38:23 GMT+0800 (China Standard Time)

It might perhaps depend on how much code we'd have to rely on that was outside of the standard libraries. But like @blink1073 suggests, it would not be good to fork a big chunk of code. Can be done by subclassing? Is it smallish in size? Is it something that python standard library would want to include?

Joseph Murphy · Answer 23 · Thu Jul 13 2017 04:41:19 GMT+0800 (China Standard Time)

200 lines of code presently. It's unclear how much of Pexpect would need refactoring. The options I put in are entirely optional arguments, but to actually use segregated streams would definitely change at least a few things on Pexpect's side.

Joseph Murphy · Answer 24 · Thu Jul 13 2017 04:44:51 GMT+0800 (China Standard Time)

The pty core module itself is relatively tiny at 170 lines total because it is at could be described as some convenience functions that just garble the stdin, stdout, and stderr file descriptors together.

Douglas Blank · Answer 25 · Thu Jul 13 2017 05:15:15 GMT+0800 (China Standard Time)

I'd say make a PR and let's check it out.

Joseph Murphy · Answer 26 · Thu Jul 13 2017 07:21:47 GMT+0800 (China Standard Time)

@blink1073 what are the odds something like a ptyplus module could find a home in pexpect, which would do this segregated reading so that any code pexpect depends on is either in base or in pexpect?

Joseph Murphy · Answer 27 · Thu Jul 13 2017 07:38:32 GMT+0800 (China Standard Time)

/ sorry for the continued discussion here, but I'm still formulating who, what, and where these changes are going to be made (best to go in with a plan, right?). I think I can even enable pexpect's standard "read all" mode by using some trickery with piping things written onto a separate stderr to a "unified" fd that combines the three just like old times. The tricky part then is say you tell pexpect to read until EOF on the "unified" fd, then how does one know how much of that came from the stderr fd and how much of that came from the stdout fd. So I'm about halfway to a comprehensive solution. Also there are other ways the reading problem could be addressed, like reading each incrementally and waiting for both to hit EOF or some combination of conditions.

Joseph Murphy · Answer 28 · Thu Jul 13 2017 07:41:01 GMT+0800 (China Standard Time)

Or you could have it read from the unified stream and check if each chunk it reads is the next thing on the top of a stream-specific fd.
edit: clarity

Steven Silvester · Answer 29 · Thu Jul 13 2017 20:36:15 GMT+0800 (China Standard Time)

pexpect would have to be refactored to add an optional handling of stderr if that enhanced module were available. I'm not sure how it would interact with the .expect() method. We'd be expecting a prompt, but would need to handle an out-of-band stderr and potentially bail.

Douglas Blank · Answer 30 · Thu Jul 13 2017 21:22:04 GMT+0800 (China Standard Time)

@JMurph2015 Are you sure that this is the right approach? What about a pexpect-based kernel that doesn't write to stderr? It still needs a method to signal an error. I'm still wondering about a special text signal that could signal an error, and even an API for routing text to stderr in a jupyter frontend, even if it didn't come from stderr in the external process.

Joseph Murphy · Answer 31 · Fri Jul 14 2017 03:25:40 GMT+0800 (China Standard Time)

@dsblank, I like the idea, but I just don't know how one would be able to get that inserted into a runtime's error processing loop. For example, how would you be able to direct Octave that it should print some special characters every time it hits an error so that Metakernel and/or Octave kernel can pick that up?

Joseph Murphy · Answer 32 · Fri Jul 14 2017 03:27:11 GMT+0800 (China Standard Time)

Or, what happens if the REPL segfaults and Octave stops altogether? ( this one may be easier because Pexpect should know if the process crashes altogether.)

Steven Silvester · Answer 33 · Fri Jul 14 2017 03:27:55 GMT+0800 (China Standard Time)

Octave prints error: or syntax error: at the start of the line for errors. Right, we can catch a repl segfault.

Steven Silvester · Answer 34 · Fri Jul 14 2017 03:28:35 GMT+0800 (China Standard Time)

cf https://www.gnu.org/software/octave/doc/interpreter/Errors.html

Joseph Murphy · Answer 35 · Fri Jul 14 2017 04:33:52 GMT+0800 (China Standard Time)

@blink1073 Not to get too idealist here, but you technically lose some functionality there because you would then prohibit the first thing a user prints from being something matching the regex
".* error: ($s).*".

That's been exactly my workaround for my project so far, but my whole team agrees that it isn't great to have to tell users "Specific kernels don't process errors out of band, so please refer to our compatibility table and follow those rules whenever using those kernels"

edit: readability

Joseph Murphy · Answer 36 · Fri Jul 14 2017 12:30:20 GMT+0800 (China Standard Time)

I had a pretty decent idea just a few minutes ago that might help us all out. If we called the repl in question with a bash pipe that padded stderr with somewhat long sequences of unprintable characters (that we know), then we could just regex on those sequences later and if they were long, uncommon, and unprintable, then it's extremely unlikely to be something a user cared about seeing, and I can feel confident in calling it an actual error.

Steven Silvester · Answer 37 · Fri Jul 14 2017 19:45:19 GMT+0800 (China Standard Time)

That sounds reasonable.

Douglas Blank · Answer 38 · Fri Jul 14 2017 22:08:38 GMT+0800 (China Standard Time)

@JMurph2015 Yes, I was thinking along those lines. If we can make it an API (of sorts) that would be very useful in other situations.

Joseph Murphy · Answer 39 · Sat Jul 15 2017 02:11:36 GMT+0800 (China Standard Time)

So the train has kinda left the station for me to do this on my internship time, but I can probably hack around a little and talk about what this should look like on my free time. Maybe we should make an empty PR or something to talk out how this should look?