Null client received in TestPromiseOrdering

Question

Null client received in TestPromiseOrdering

lthibault opened this issue a year ago · comments

davidhubbard · Answer 1 · Thu Aug 24 2023 11:47:17 GMT+0800 (China Standard Time)

For context, Apr 28 @zenhack wrote the following to @lthibault in the go-capnp Matrix channel:

how tolerant are you to regressions? I'm getting to the point where I'm debating whether just gutting some sections of this and re-implemening, getting the bones right without worrying about keeping it bug-free every step of the way, and then hunting stuff down later would make this easier. Hopefully the way I do it would mean fewer bugs in the long run.

...I may not regardless, but it's a thought that's beginning to haunt me

@lthibault and I decided it'd be informative, at least, to do a quick git bisect from the last tagged release at v0.30 (May 15) through the present, though it will mean I'll have to backport TestPromiseOrdering which was added in #530 to main on Jun 29. Should be very doable.

The bisect will attempt to track down if any particular commit by @zenhack - or anyone else for that matter - stands out as a moment in time when TestPromiseOrdering broke.

tl;dr: if we had TestPromiseOrdering back when v0.30 was released, this issue might have been caught. The idea is to apply TestPromiseOrdering at each commit and see if this theory holds, and we can point at a single commit and say, "this is why TestPromiseOrdering is failing."

davidhubbard · Answer 2 · Tue Aug 29 2023 11:05:33 GMT+0800 (China Standard Time)

I've successfully ported TestPromiseOrdering() on top of v0.30 (May 15) by sort-of-cherry-picking @zenhack's #530 of Jun 29. The unit test now passes against that May 15 code.

I've already found and fixed an E-Ordering issue, which appeared immediately when the unit test was first merged on Jun 29.

I'll continue working on a branch. The next step will be to rework the commit on top of Jun 26 which introduces ClientSnapshot. While challenging, I think I see how to do it. At that point I'll be "caught up" with where @zenhack was when he first merged the unit test on Jun 29.

The final step to reach a pull request is to catch up to the present day -- there are a few more commits from 390b049 to 163c79d mostly around the new structured logger interface. They touch a different section of the code.

I'll send it when I'm all caught up. The nice thing about using a side-branch while I time-travel back 3 months (git is so cool), is I can keep running the fixed unit test as I work my way forward in time. The commits in my side-branch will be overwritten. I plan to force-push the branch to bring it forward in time.

That the unit test wasn't working in 7e1bedd suggests @zenhack had a solution way back then. I would've really liked to hear his thoughts. :(

tl;dr: it's looking doable to fix TestPromiseOrdering(), though we're sorely missing @zenhack and will probably not fully grasp his vision.

Louis Thibault · Answer 3 · Tue Aug 29 2023 23:33:49 GMT+0800 (China Standard Time)

This is awesome! I'd very much welcome the opportunity to sync up on this with you, at your convenience, just to keep the proverbial bus factor above 1.

Does it make sense to eagerly merge your fix into main?

davidhubbard · Answer 4 · Fri Sep 01 2023 12:40:00 GMT+0800 (China Standard Time)

Quick dev update: I've force-pushed on my branch, now only 28 commits behind HEAD.

~~It panics in a different test (TestSendDisembargo) with the impent getting Shutdown() from the Client and releasing the import entirely -- I'm still working on the reference counting.~~

Update: I've force-pushed again and it now passes all tests ... except TestPromiseOrdering, which runs to completion. i.e. TestPromiseOrdering no longer gets the Null client which this issue is all about; instead the problem now is that promises are not fulfilled in E-Order... But this is pretty exciting.

The previous commit is still present until it is garbage-collected.