heap_free_all frees sealed objects

Question

heap_free_all frees sealed objects

davidchisnall opened this issue a year ago · comments

The heap_free_all function frees all objects allocated with an allocation capability. This means that it can be used to attack compartments that have allocated memory for a given caller.

It shouldn't, and we should provide a heap_free_all_sealed that frees everything sealed with a particular type that can be used for cleanup.

This is somewhat complicated by the fact that we do want to drop claims on sealed objects, we just don't want to allow them to actually be freed. @nwf, what do you think the right behaviour is?

David Chisnall · Answer 1 · Tue Jan 16 2024 17:25:28 GMT+0800 (China Standard Time)

Thinking a bit more about this after doing a load of network stack things:

In the network stack, at least, sockets are reachable only by the sealed capability and they point to a load of other things that are owned by the network stack that are owned by the socket structure and so there is no way to gracefully clean them up. Even a heap_free_all variant that requires an authorising sealing capability would not help here because the network stack needs to gracefully clean things up (which it actually does by posting a message to another thread to serialise all deallocation events).

The API that I think I want is actually more like this:

int heap_free_all_sealed(Timeout *timeout, SObj mallocCapability, __cheri_callback void (*callback)(SObj);

This would find all sealed objects allocated with (or claimed by) mallocCapability and call callback on them. The callback could then gracefully free everything by dispatching to the correct cleanup function for all sealed objects that it knows about.

For this to be useful, I think we need two additional APIs. The first lives in the token library:

uint32_t token_type(SObj)

This returns the type. When you're handed a sealed capability, you can use this to record the type so that you know where to dispatch the cleanup.

I think we also need:

int heap_claims_count(void*);

This returns the number of claims that exist on an object (or an error) and can be called wit sealed or unsealed objects. In the network stack's socket close function, we require that the caller pass their own malloc capability so that we can drop the claim. We don't free it if dropping the claim fails, so that you can separate the ability to allocate and deallocate sockets from the ability to send and receive over them trivially. This means that we need a fallback path for 'oh, dropping the claim failed, but you've already dropped the claim via heap_free_all so that's fine' (as a side effect, anyone with the socket handle can close it if the compartment that owns it has reset itself, but I don't think that's a problem).

@nwf / @nwf-msr, what do you think?

Nathaniel Filardo, once at MS Research · Answer 2 · Fri Jan 19 2024 04:00:53 GMT+0800 (China Standard Time)

I like this API better, to be sure, but I worry that

int heap_free_all_sealed(Timeout *timeout,
                         SObj mallocCapability,
                         __cheri_callback void (*callback)(SObj));

doesn't have a way for the callback to indicate that it could not deal with a particular sealed type (perhaps because it doesn't have the authority to unseal). Maybe that's always a bug, but... could you instead use

int heap_free_all_sealed(Timeout *timeout,
                         SObj mallocCapability,
                         SKey unsealer,
                         __cheri_callback void (*callback)(uint32_t, void*));

which iteratively frees only those sealed objects that can be free'd or decref'd by mallocCapability and that could be unsealed with unsealer, after having invoked the callback on the unsealed form and its type? Some additional commentary on this API:

I don't think there's reason to worry about passing in the unsealer authority, since all the SObj/SKey machinery is a contract provided by the allocator anyway.
Passing the unsealed form to the callback avoids a libcall to the fast unsealer.
But since type discrimination might still be useful, it probably makes sense to pass the type to the callback. I am not sure if this completely obviates the present need for token_type; if it does not, the latter is surely easy enough to add to the fast unsealer library.

David Chisnall · Answer 3 · Fri Feb 16 2024 00:38:34 GMT+0800 (China Standard Time)

I think your API has a different use case in mind, but I'm increasingly thinking it's a more sensible one.

If compartment A allocates something sealed on behalf of compartment B, I was imagining that B would call the API and then call into A to clean them up. With your model, A would call into B and say 'please free anything that you've allocated for me'.

Your approach feels somewhat nicer because the fact that these things are sealed with a specific key is somewhat irrelevant. The calling compartment doesn't want to care what types a thing allocated on its behalf has, it wants to tell things that have allocated objects for it that they should go away.

In the network stack currently, the connection object is a sealed thing allocated with the caller's quota that points to a bunch of other things (FreeRTOS+TCP state). If those things go away at surprising times then the network stack can crash, so the network of stack claims them as well. This opens a potential denial of service attack, where you keep allocating connections and then doing heap_free_all and exhaust memory using the network stack's quota. Your variant would mean that we could potentially do a heap_free_all_sealed in the new-connection API to enumerate all connections allocated with the quota that we're passed and gracefully clean up any that are dangling.

After considering that use case a bit more, I suspect that what I actually want is a heap_visit_sealed that returns all of the sealed capabilities created with a permit-seal capability. There's a good reason for this to pass the sealed version to the callback: you can't free the object with just the unsealed one. It's probably worth passing both, because most of the time you also want the unsealed version.

That said, if the desired use case is asking another compartment to free all of the things that it allocated for you, then it may be simpler to not have this functionality in the allocator at all. If sealed things are exempted from sealed objects then a compartment can just maintain a linked list of things that it's allocated and walk that list when it wants to do cleanup (or even periodically) to clean up any dangling things.

In the TLS compartment, we currently have pretty strong flow isolation. Per-connection state is not reachable from any global. Not exposing a mechanism for finding sealed objects on the heap means that heap_free_all will cause in-flight TLS operations on the connection to abort, but not free the associated connection object and require the caller to keep a handle to the TLS session and free it explicitly. I think that's fine: most compartments have zero or one TLS connections.

TLS is quite a fun case because there are three compartments:

The user compartment initiates a TLS connection and owns the allocation capability.
The TLS compartment owns the TLS state and returns a sealed capability that uses this allocation capability.
The TLS state wraps a socket, which is another sealed capability allocated from the same malloc capability.

If heap_free_all doesn't free sealed things, the internal state for the socket remains claimed by the network stack but needs explicit teardown. The TLS connection state has strong flow isolation and so your cleanup process needs to be:

heap_free_all
tls_connection_close with any TLS connections that you hold handles to.

At this point, everything is gracefully cleaned up. This is quite nice.

The message queue compartment uses a single (sealed) allocation for the entire allocation, so works fine if we just exclude sealed things.

So now my leaning is to not provide the visit API at all and just exclude sealed things from heap_free_all. @nwf-msr?