capnproto / go-capnp

Cap'n Proto library and code generator for Go

Home Page:https://capnproto.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use after free on importClients

zenhack opened this issue · comments

When attempting to upload a large directory with ocap-merkledag, I see an imported capability being released, and then subsequently passed to another call, resulting an exception due to the invalid export id; this is indicative of a bug in go-capnp, and @lthibault suggested that this might be the source of a problem he'd observed in wetware as well. There is a formatted trace of the entire run here:

https://mirror.zenhack.net/pub/trace.log

Unfortunately it is not a particularly minimal example.

Notes on the format of the trace:

  • The "Send" or "Recv" tag indicates whether the message being logged was sent or received; this is recorded from the client's perspective.
  • The content fields in the call/return payloads have been nulled out, because they are relatively uninteresting and include some large byte blobs which aren't interesting for our purposes.

I was digging around and found something suspicious:

https://github.com/capnproto/go-capnproto2/blob/6ab2f9d6da5b36d64658c7425cd1afbd8ccee58e/rpc/question.go#L243-L244

There's a similar call site in import.go, for importClients. The non-error result of fillPayloadCapTable is a map[exportID]int32 that records what changes we need to make to our export table's refcounts due to the payload, but we never actually use it -- this could easily result in dropping the export sooner than we should, due to a too-low refcount.

I'm not 100% sure this is the source of this bug, but it seems worth fixing before investigating other possible causes.

I can no longer reproduce this on main. I did a git bisect to figure out which commit fixed it, and it came up with 05c2793. It seems plausible to me that that would have fixed the issue.

The refcounting issue I pointed to above still seems wrong though...

The refcounting is actually fine; see #331. Since 05c2793 seems to have fixed the actual observable problem, I'm going to close this.