Separate `dealloc` from `Alloc` into other trait

Question

Separate `dealloc` from `Alloc` into other trait

TimDiekmann opened this issue 5 years ago · comments

Most (all?) of the structs mentioned in #7 only needs the dealloc method for Drop. It'd may be useful to split up Alloc into two traits. We didn't came up however with the exact layout and relationship with those two traits. So far, those possibilities showed up:

Make it a supertrait: trait Alloc: Dealloc { ... } (#9 (comment))
~~Associate the Alloc trait: trait Dealloc { type Alloc: Alloc; ... }~~
Introduce a trait trait GetDealloc { unsafe fn get_dealloc() -> ???; }
Same as above, but get_alloc as method in Alloc instead of an extra trait (#9 (comment))
Also split realloc into Realloc and associate Dealloc with Alloc and Realloc (#9 (comment))
- Also associate Alloc with Realloc (#9 (comment))
Even more complicated hierarchies (#9 (comment))

Edits

2019/Oct/05: Reflect the threads current solution proposals.

Mike Hommey · Answer 1 · Sat May 04 2019 05:13:48 GMT+0800 (China Standard Time)

Other possibility:

trait Dealloc { ... } impl<T: Alloc> Dealloc for T { ... }, and change the relevant bounds to Dealloc.

Simon Sapin · Answer 2 · Sat May 04 2019 17:31:12 GMT+0800 (China Standard Time)

only needs the dealloc method for dropping

This is not quite true. Other APIs like Box::clone or Rc::make_mut may need to allocate.

I think it’s important for this issue to provide some context and motivation.

Current proposals revolve around adding an A: Alloc type parameter to types such as Box<T>, Vec<T>, etc; and storing a value of type A inline in those structs. For "traditional" allocators like jemalloc that are process-global / singleton, A can be a zero-sized type. However for allocators that might have multiple "instances", A needs to be a handle like &_ or Arc<_> in order to associate each collection value with its corresponding allocator instance. This means e.g. doubling size_of for Box, which has non-trivial cost.

This issue is about reducing this cost in a narrow set of circumstances:

The allocator has multiple instances, so allocating requires a non-zero-size handle
Deallocation is a no-op (for example in a simple bump allocator) or otherwise doesn’t require a handle that point to the allocator instance.
And the user is willing to give up on APIs like Box::clone, such that the box only ever needs to know how to deallocate, never allocate.

In that case we could in theory have zero-size deallocation-only handles to keep in Box<T, A>, in order to keep it small.

Simon Sapin · Answer 3 · Sat May 04 2019 18:03:54 GMT+0800 (China Standard Time)

So far I haven’t seen a complete proposal of what an API supporting this use case might look like. It’s not just the trait:

Giving up on Clone and friends needs to be a opt-in choice, so there needs to be dedicated APIs on collections in any case. Does that mean e.g. Box::new_dealloc_only_in in addition to Box::new_in? What’s the signature?
Before we can get a deallocation-only handle that is appropriate for some allocation, that allocation needs to have been allocated at some point. Presumably with a “full” handle. Does that mean that a “full” handle knows how to downgrade itself to deallocation-only? What’s the API for that?

Before we accept it as a goal to support this use case, I’d like someone who wants it to come up with a more comprehensive API proposal. That should be the starting point of the discussion.

But if this adds significant complexity to the type signature even for users who do not use this feature, I’m not sure we should accept such a narrow use case.

Tim Diekmann · Answer 4 · Sat May 04 2019 18:12:53 GMT+0800 (China Standard Time)

This is not quite true. Other APIs like Box::clone or Rc::make_mut may need to allocate.

I'm sorry, I think I expressed myself misunderstandably. The struct itself only needs Dealloc as bound as Drop only needs dealloc. Things like Box::clone could bind A: Alloc + Dealloc.

Scott J Maddox · Answer 5 · Sun May 05 2019 06:22:48 GMT+0800 (China Standard Time)

Yeah, the impl<T> for Box (and other collections) would just need to be split into impl<T, A: Dealloc> and impl<T, A: Alloc+Dealloc>. You wouldn't need a separate new_dealloc_only_in.

Simon Sapin · Answer 6 · Sun May 05 2019 06:30:39 GMT+0800 (China Standard Time)

I don’t understand. If new_dealloc_only_in is not needed, please provide the full signatures you would expect for the Box type, the constructor, and the destructor. In particular, how is the allocation owned by Box<T, A: Dealloc> created?

Tim Diekmann · Answer 7 · Sun May 05 2019 06:33:12 GMT+0800 (China Standard Time)

Not a signature, but with from_raw_in it would be possible. It's rather lowlevel but for complex data structures this might makes sense.

Scott J Maddox · Answer 8 · Sun May 05 2019 07:05:50 GMT+0800 (China Standard Time)

You're right, my previous suggestion was incorrect. However, it can be done like this (unless I'm missing something):

struct Box<T: ?Sized, D>(Unique<T>, D);

impl<T: ?Sized, D: Dealloc> Drop for Box<T, D> {
    fn drop(&mut self);
}

impl<T: ?Sized, D: Dealloc, A: Alloc<Dealloc=D>> Box<T, D> {
    fn new_in(x: T, a: A) -> Box<T, D>;
}

Simon Sapin · Answer 9 · Sun May 05 2019 14:18:25 GMT+0800 (China Standard Time)

@TimDiekmann So the only way to use this feature would require unsafe code?

@scottjmaddox So we’d have an Alloc::downgrade(self) -> Self::Dealloc method, and the choice of giving up on Box::clone or not (A = D) would be based on using a different allocator type?

Scott J Maddox · Answer 10 · Sun May 05 2019 15:37:24 GMT+0800 (China Standard Time)

Yes, you would need something like Alloc::downgrade(self) -> Self::Dealloc; perhaps just Alloc:get_dealloc(&self) -> Self::Dealloc. And as you say, some allocators would provide a type that implements Alloc + Dealloc instead of just Dealloc, and the former would impl Box::clone.

Ideally, there would be additional methods like Box::clone_in that accept an Alloc argument.

gnzlbg · Answer 11 · Sun May 05 2019 17:34:30 GMT+0800 (China Standard Time)

So the only way to use this feature would require unsafe code?

Yes. I don't know if the analogy helps, but have you used C++'s std::unique_ptr ? It only needs a "custom deleter" to free itself on destruction. The std::unique_ptr itself is "move only", and cannot be implicitly cloned (has no copy constructor/assignment).

IIUC what's being proposed here is the same. Box<T, A: Dealloc> is the bound on the type. This is useful, e.g., because you don't necessarily need to construct a Box via Box::new, you can also construct a Box from a raw pointer, e.g., coming from FFI (e.g. from a C++ unique_ptr).

Most of the Box functionality would just be impl<T, A: Dealloc> for Box<T, A> { ... }. As you mention, some of the functionality, like Box::new, would be in a impl<T, A: Alloc + Dealloc> for Box<T, A> { ... } and some of it, like Box::clone, in a impl<T: Clone, A: Alloc + Dealloc> for Box<T, A> { ... }.

Scott J Maddox · Answer 12 · Mon May 06 2019 02:04:11 GMT+0800 (China Standard Time)

@gnzlbg Is there any reason my suggestion for a safe new_in function would not work? There should certainly be from_raw_in, too. And Box::clone could still be in an impl<T: Clone, A: Alloc + Dealloc> for Box<T, A> { ... } block so that it's available if the allocator handle is Alloc+Dealloc.

gnzlbg · Answer 13 · Mon May 06 2019 17:13:19 GMT+0800 (China Standard Time)

@scottjmaddox

One of the main use cases for Box<T, D: Dealloc> is FFI wrappers, where e.g. a C library gives you ownership of some value, and provides you with a function to free it. There is no way to clone that Box, or use the Box type to allocate anything else with it.

Ideally, you'd just implement Dealloc for a MyCResourceDeallocator ZST, and use Box<MyCResource, MyCResourceDeallocator> directly in C FFI.

I'm not sure how you would be able to achieve that with new_in, but I think this is a use case worth supporting.

Scott Maddox · Answer 14 · Tue May 07 2019 05:32:07 GMT+0800 (China Standard Time)

@gnzlbg I totally agree that that's a use case worth supporting, and that from_raw_in is a great way to support that. I'm just asking if there's any reason you couldn't also have the new_in I suggested, so that there's a way to use this feature without unsafe.

gnzlbg · Answer 15 · Tue May 07 2019 15:09:30 GMT+0800 (China Standard Time)

@scott-maddox would that require implementing Alloc for MyCResourceDeallocator ?

Scott J Maddox · Answer 16 · Wed May 08 2019 04:07:22 GMT+0800 (China Standard Time)

@gnzlbg No, it would not. With my suggestion, implementing Alloc would require implementing Dealloc, but implementing Dealloc would not require implementing Alloc.

(Side note: this is the same person as scott-maddox; I meant to use this account.)

Mike Hommey · Answer 17 · Wed May 08 2019 14:30:10 GMT+0800 (China Standard Time)

I was thinking about this a little, and I think this means there needs to be a different trait for realloc too. Because Alloc + Dealloc doesn't allow doing something specific for realloc instead of doing a dealloc + alloc sequence. So there would need to be a Realloc trait, as well as a a default impl<A: Alloc + Dealloc> Realloc for A.

gnzlbg · Answer 18 · Wed May 08 2019 14:31:51 GMT+0800 (China Standard Time)

as well as a a default impl<A: Alloc + Dealloc> Realloc for A.

The problem with that is that, without specialization, users cannot override that impl.

Mike Hommey · Answer 19 · Wed May 08 2019 14:33:49 GMT+0800 (China Standard Time)

Thus "default" in my sentence.

gnzlbg · Answer 20 · Wed May 08 2019 14:39:18 GMT+0800 (China Standard Time)

I thought that wasn't intended. If that's by design, then the main downside is still that we would be blocking the stabilization of these APIs on stable specialization. I'm not sure that would make strategic sense.

Mike Hommey · Answer 21 · Wed May 08 2019 14:42:05 GMT+0800 (China Standard Time)

I'm not sure it would need to block on stable specialization. Implementers should be able to impl Realloc for their type whether specialization is stable or not, shouldn't they?

Mike Hommey · Answer 22 · Wed May 08 2019 14:45:03 GMT+0800 (China Standard Time)

Without a Realloc trait, Alloc should keep both realloc and dealloc methods, and there should be an impl<A: Alloc> Dealloc for A (which is what I mentioned in #9 (comment) already).

gnzlbg · Answer 23 · Wed May 08 2019 14:55:55 GMT+0800 (China Standard Time)

Is there is some already-stable magic that allows users to specialize without specialization default impls of liballoc ?

If not, your blanket impl<A: Alloc> Dealloc for A has the same problem. T

here are two impls for your allocator, the blanket one that you provide (e.g. Dealloc, and Realloc), and the one that a user might want to write. Without specialization, those two conflict.

Mike Hommey · Answer 24 · Wed May 08 2019 15:33:19 GMT+0800 (China Standard Time)

A user wouldn't have to write a Dealloc impl if they write a Alloc impl, because dealloc is already in there.

gnzlbg · Answer 25 · Wed May 08 2019 15:43:12 GMT+0800 (China Standard Time)

I’m not sure we are talking about the same trait hierarchy then. I understood this issue as separating dealloc from the Alloc trait into a different trait, such that ‘trait Alloc: Dealloc { ... no dealloc here ... }’.

Mike Hommey · Answer 26 · Wed May 08 2019 15:44:00 GMT+0800 (China Standard Time)

And I'm saying you can't detach dealloc entirely from the trait unless you detach realloc in yet another trait. Although with trait Alloc: Dealloc, that might work... but that was not the most discussed option from the topmost comment.

Simon Sapin · Answer 27 · Wed May 08 2019 15:52:28 GMT+0800 (China Standard Time)

Implementers should be able to impl Realloc for their type whether specialization is stable or not, shouldn't they?

As far as I understand, no. Such an impl would conflict with impl<A: Alloc + Dealloc> Realloc for A.

And yes, it does sound like trait Alloc: Dealloc {…} would be required so that realloc can be a default method of the Alloc trait (with a default behavior based on alloc + copy + dealloc). Is there a downside to that?

Mike Hommey · Answer 28 · Wed May 08 2019 15:55:16 GMT+0800 (China Standard Time)

There probably isn't a downside. All I'm saying at this point is that not using specialization limits the options we have in how this can be approached to trait Alloc: Dealloc and trait Dealloc { ... } impl<T: Alloc> Dealloc for T { ... } (with the dealloc function still being in Alloc), while we've only discussed the other options so far.

Mike Hommey · Answer 29 · Wed May 08 2019 16:27:06 GMT+0800 (China Standard Time)

As far as I understand, no. Such an impl would conflict with impl<A: Alloc + Dealloc> Realloc for A.

Tested, and that's unfortunately true. Specialization can't come soon enough :(

Tim Diekmann · Answer 30 · Wed May 08 2019 18:01:59 GMT+0800 (China Standard Time)

Tested, and that's unfortunately true. Specialization can't come soon enough :(

As specializationi is on the road map of 2019, I think we can rely on it. I don't expect the allocator_api to be stabilized in the next 6 months?

gnzlbg · Answer 31 · Thu May 09 2019 01:54:40 GMT+0800 (China Standard Time)

As specializationi is on the road map of 2019, I think we can rely on it.

I don't share your optimism, but I do think that we should try to keep this issue on topic.

We are mixing two issues here. Whether it is worth to separate dealloc from Alloc "somehow", and whether iff we had a hierarchy or set of allocator traits (Alloc, Dealloc, Realloc, ...), how would we design that. Maybe we should open a new issue about this other point to discuss the different ways to design that.

Mike Hommey · Answer 32 · Thu May 09 2019 03:18:34 GMT+0800 (China Standard Time)

My point is that there are four ways to go around separating dealloc from Alloc that have been proposed in this issue. Two of them have been discussed mainly, and none of those two appear to work out without having a separate Realloc.

gnzlbg · Answer 33 · Thu May 09 2019 03:59:20 GMT+0800 (China Standard Time)

AFAICT this would work:

trait Dealloc { fn dealloc(...); }
trait Alloc: Dealloc {
    fn alloc(...) -> ...;
    fn realloc(...) -> ... { /*can call both alloc and dealloc here*/ }
}

gnzlbg · Answer 34 · Thu May 09 2019 04:04:00 GMT+0800 (China Standard Time)

This would also work (no super trait):

trait Dealloc { fn dealloc(...); }
trait Alloc {
    type Dealloc: Dealloc;
    fn alloc(...) -> ...;
    fn get_dealloc(&self) -> &Self::Dealloc;
    fn realloc(...) -> ... { 
        /* can call both self.alloc(...) and self.get_dealloc().dealloc(...) */ 
    }
}

gnzlbg · Answer 35 · Thu May 09 2019 04:11:50 GMT+0800 (China Standard Time)

I don't see the other approaches discussed in the issue much, but the OP mentions:

Associate the Alloc trait: trait Dealloc { type Alloc: Alloc; ... }

This does not work for the FFI use case. It would mean that to implement Dealloc for a type, you would need another type with a meaningful Alloc implementation, which for that use case does not exist (The C API gives you ownership of some memory, and a way to free it, but no way to allocate anything).

I have nothing against exploring more complicated hierarchies:

trait Alloc { /*only:*/ fn alloc(...) -> ...; }
trait Dealloc { fn dealloc(...); }
trait Realloc: Alloc + Dealloc { fn realloc(...) -> ... { /* default using alloc and dealloc */ } }
trait CollectionAllocator: Realloc + .... { ... }
struct Vec<T, A: CollectionAllocator> { ... }

or other implementation approaches, e.g., blanket impls, specialization, how would we extend those hierarchies in a backwards-compatible way if we discover later on that we need a new trait in the middle of the hierarchy, etc. but that looks like an overarching design question that can happen in parallel to this discussion.

Peter Todd · Answer 36 · Thu May 09 2019 04:14:17 GMT+0800 (China Standard Time)

Yup, as long as the allocation doesn't need to be "in-place" realloc is an optimization over alloc if you already have a handle to the allocator available; if you don't realloc can succeed where alloc can't as some allocators could use the pointer to the allocation to get a pointer to the allocator. For example, Vec could be resized without a handle to the allocator.

But that design conflicts with the current one where creating zero-sized structures is a no-op, so probably not worth discussing further.

gnzlbg · Answer 37 · Thu May 09 2019 04:22:04 GMT+0800 (China Standard Time)

For example, Vec could be resized without a handle to the allocator.

@petertodd I think we could do this by using the API proposed in #12 on all collections (not only Box<T>).

Simon Sapin · Answer 38 · Thu May 09 2019 04:32:22 GMT+0800 (China Standard Time)

The C API gives you ownership of some memory, and a way to free it, but no way to allocate anything

This sounds like this API is simply not an allocator. It has a destructor function that you are responsible for calling (because C), which is a job for the Drop trait and a wrapper trait more than for a Dealloc trait.

Mike Hommey · Answer 39 · Thu May 09 2019 04:37:10 GMT+0800 (China Standard Time)

This would also work (no super trait):

trait Dealloc { fn dealloc(...); }
trait Alloc {
    type Dealloc: Dealloc;
    fn alloc(...) -> ...;
    fn get_dealloc(&self) -> &Self::Dealloc;
    fn realloc(...) -> ... { 
        /* can call both self.alloc(...) and self.get_dealloc().dealloc(...) */ 
    }
}

The idea being for Box<T, Dealloc> being possible we'd need the opposite. But I think we don't actually need the whole get_something approach.

trait Dealloc { fn dealloc(...); }
trait Alloc: Dealloc {
    fn alloc(...) -> ...;
    fn realloc(...) -> ... { /*can call both alloc and dealloc here*/ }
}

struct Box<T, A: Dealloc>(...);

impl<A: Dealloc> Drop for Box<T, A> { ... };

impl<T: Clone, A: Alloc> Clone for Box<T, A> { ... }

is what we'd want, presumably.

Simon Sapin · Answer 40 · Thu May 09 2019 04:55:33 GMT+0800 (China Standard Time)

I’d like that we take a step back for a moment. As library designers it can be satisfying to make APIs that are as general or flexible as possible, but do we know anyone who actually wants to use this? Or is this all hypothetical? Remember that none of this issue is relevant unless:

There’s an allocator that requires a non-zero-size handle for allocation
And that allocator does not require non-zero-size handle for deallocation
And the user is willing to give up on clone and any other API that needs to (re)allocate
And the cost of unnecessarily storing a full handle is significant

Secondly, if this is indeed a real use case, how important is it to use std::boxed::Box<T, A> for it? Could it just as well be served by a NoOpDeallocBox<T> type on crates.io?

This thread is quickly getting long, which is a sign that supporting this use case is not easy. But maybe it’s too niche to be worth the design complexity.

Scott J Maddox · Answer 41 · Thu May 09 2019 05:05:49 GMT+0800 (China Standard Time)

Now that you mention it, I think Realloc should be a separate trait, that way the collection methods that need it can be bounded on it precisely. And I don't think it needs to have a default impl. Implementing an allocator is not something to be taken on lightly. Adding one more trait impl is not that big of a deal. If we can somehow reserve the option to later add one once specialization is stable, that would be good, though.

Here's what I have in mind:

trait Alloc {
    type Realloc: Realloc;
    type Dealloc: Dealloc;
    ...
}
trait Realloc {
    type Dealloc: Dealloc;
    ...
}
trait Dealloc { ... }

struct Box<T: ?Sized, D: Dealloc>(Unique<T>, D);

impl<T: ?Sized, D: Dealloc> Drop for Box<T, D> {
    fn drop(&mut self);
}

impl<T: ?Sized, D: Dealloc, A: Alloc<Dealloc=D>> Box<T, D> {
    fn new_in(x: T, a: A) -> Box<T, D>;
}

pub struct RawVec<T, D: Dealloc> {
    ptr: Unique<T>,
    cap: usize,
    a: D,
}

impl<T: ?Sized, D: Dealloc> Drop for RawVec<T, D> {
    fn drop(&mut self);
}

impl<T: ?Sized, R: Realloc<Dealloc=R> + Dealloc> RawVec<T, R> {
    fn double(&mut self) -> RawVec<T, R>;
}

impl<T: ?Sized, R: Realloc<Dealloc=D>, D: Dealloc> RawVec<T, D> {
    fn double_in(&mut self, a: R) -> RawVec<T, D>;
}

If we switch the bounds to BuildAlloc, BuildRealloc, and BuildDealloc (see issue #12), this could potentially enable some really unique and clever allocator designs... Designs that aren't possible in any other language.

Edit: I'm tempted to go ahead and assume we'll be switching to AllocHandle, etc., and update my example, because it makes it significantly more clear...

Edit 2: Add Dealloc bound for Box struct, add RawVec struct definition, fix bounds for double

gnzlbg · Answer 42 · Thu May 09 2019 05:23:30 GMT+0800 (China Standard Time)

This sounds like this API is simply not an allocator. It has a destructor function that you are responsible for calling (because C), which is a job for the Drop trait and a wrapper trait more than for a Dealloc trait.

@SimonSapin I don't think that works. Consider:

let b: Box<CVal, Dealloc> = c_api_call();
let cval: CVal = *b; // moves CVal into the stack, calls `Dealloc::dealloc` to free the memory
let _ = cval; // drops CVal (might do nothing, might do something)

Here, Dealloc::dealloc might call c_free_cval_memory(CVal*), and <CVal as Drop>::drop() might, e.g., do nothing (or call a different c_drop_cval() function).

Without Dealloc, I would somehow need to override the impl of Drop for Box<CVal> to be able to solve this problem with just Drop. I don't think this can be done, even with specialization, since that would need to expose the internals of Box.

Scott J Maddox · Answer 43 · Thu May 09 2019 05:53:47 GMT+0800 (China Standard Time)

I tend to prioritize allocator users over allocator implementors (traits are implemented once, but used throughout the ecosystem), and I don't see which value would this add for users.

I support prioritizing simplicity for end users, which requires maximizing power of expression for library authors. I look at this as a way to maximize power of expression for library authors. If the library author doesn't want to distinguish between Alloc and Realloc then they can just impl Alloc+Realloc for their handle type and write all bounds as A: Alloc+Realloc.

When would it be helpful to not have a bound on Realloc, but to have a bound on both Alloc+Dealloc, which would give you realloc for free ?

Firstly, Alloc+Dealloc only gives you realloc for free if you are fine with a naive implementation for realloc. I expect that only the most basic allocators will not implement realloc themselves.

Secondly, I don't know if there's an allocator that would benefit from a separate Realloc trait; this is new territory. It's not clear how one would, but we cannot know for sure that none would after just a few minutes thinking about it.

The only thing I can imagine would be to, e.g., error at compile-time if some allocator does not implement Realloc, but nothing guarantees you that aRealloc impl won't just do what the default Alloc+Dealloc impl would do, so I don't see any advantage for users of the trait over just having a realloc method in the Alloc trait.

That is a potentially interesting use case, if the allocator author wanted to make it very clear that realloc is not optimized. It's not a great use case though, since it would be kind of annoying as an end user.

Which value does this add to RawVec, the users of RawVec, like String or Vec, and the users of these types ?

Again, nothing that I can think of, but that doesn't mean there never will be a benefit for future allocators and/or collections.

If you pass Vec an A: Alloc + Dealloc, I expect the vector to be able to grow, but it won't in your case because it doesn't implement Realloc.

If the allocator author provides a handle type that is Alloc + Dealloc but not Alloc + Realloc + Dealloc then that means they don't want to allow Realloc with it for some reason. If they did, then they would just have made it Alloc + Realloc + Dealloc.

Scott J Maddox · Answer 44 · Thu May 09 2019 06:00:48 GMT+0800 (China Standard Time)

To add to my previous comment, it's possible that having a separate Realloc trait will be important for properly designing allocators that have handles with lifetime bounds, e.g. Box<T, A=ArenaAlloc<'a>>. I've only done a little bit of design along these lines and I didn't consider reallocation, so I don't know if it would end up being important or not. We need to look into this deeper.

Scott J Maddox · Answer 45 · Fri May 10 2019 05:17:26 GMT+0800 (China Standard Time)

I don't think that works. Consider:

let b: Box<CVal, Dealloc> = c_api_call();
let cval: CVal = *b; // moves CVal into the stack, calls `Dealloc::dealloc` to free the memory
let _ = cval; // drops CVal (might do nothing, might do something)

@gnzlbg I think what @SimonSapin was saying is that you could create a new wrapper type that implements drop rather than using Box. This is how FFI wrapper libraries currently work, AFAIK. Having Box<T, A:Dealloc> might make implementing the FFI wrapper a bit easier, though, since you wouldn't need a wrapper type that implements Drop for every C type.

Scott J Maddox · Answer 46 · Fri May 10 2019 05:42:48 GMT+0800 (China Standard Time)

I’d like that we take a step back for a moment. As library designers it can be satisfying to make APIs that are as general or flexible as possible, but do we know anyone who actually wants to use this? Or is this all hypothetical? Remember that none of this issue is relevant unless:

I would like to be able to have a separate Dealloc trait (or more accurately a separate BuildDealloc trait) for implementing zero-cost arena allocators.

* There’s an allocator that requires a non-zero-size handle for allocation

* _And_ that allocator does **not** require non-zero-size handle for deallocation

Huh? Neither of these is a requirement. Take the example of an arena bump allocator, and let's assume my BuildAlloc/BuildDealloc suggestion is incorporated. The BuildDealloc type would be zero sized and would be a no-op, because deallocation does nothing. Without splitting out BuildDealloc, we would have to use BuildAlloc to retrieve a pointer to the allocator state, and thus we would have to rely on the compiler optimizing away all of that, ultimately dead, code. Now perhaps it can do that optimization without issue, I don't know. But there might be other cases that I'm not thinking of that it can not easily optimize away.

* _And_ the user is willing to give up on `clone` and any other API that needs to (re)allocate

Or perhaps the user just wants to have more control over where the value is cloned to, which could be provided by a new clone_in method.

* _And_ the cost of unnecessarily storing a full handle is significant

Storing a full handle inside every box is almost always going to be prohibitively expensive.

Secondly, if this is indeed a real use case, how important is it to use std::boxed::Box<T, A> for it? Could it just as well be served by a NoOpDeallocBox<T> type on crates.io?

By this logic, we shouldn't do Box<T, A> at all. But there's value in having a first-party solution. It provides cohesion for the community.

This thread is quickly getting long, which is a sign that supporting this use case is not easy. But maybe it’s too niche to be worth the design complexity.

I don't think the length of the thread is a good metric for how easy supporting a use case is. Rather, I think it's an indication that there is interest and many possible approaches that require further discussion.

Simon Sapin · Answer 47 · Fri May 10 2019 06:16:24 GMT+0800 (China Standard Time)

Huh? Neither of these is a requirement.

I think we’re in agreement on this. I was saying that all handles are zero-size (e.g. you have a malloc-and-free-style allocator with global state) then this thread is not relevant. If even deallocation requires a non-zero-size handle (e.g. an allocator with multiple instances/arenas/regions that reuses freed space) then this thread is also not relevant.

Or perhaps the user just wants to have more control over where the value is cloned to, which could be provided by a new clone_in method.

Yes, using clone_in instead could be a reason the user is willing to give up on clone. But that’s not necessarily all potential users of an arena bump allocator.

Storing a full handle inside every box is almost always going to be prohibitively expensive.

I think this is an exaggeration. Many people use Vec<T> even though it has a 3× larger size_of than https://crates.io/crates/thin-vec. This extra size has a cost, but maybe that cost is not part of the bottleneck.

By this logic, we shouldn't do Box<T, A> at all.

Maybe! I’ve actually been considering that if we experiment outside of the rust-lang/rust repository, then we could publish that on crates.io, and people could start relying on that crate. At that point, especially if #1 proves problematic and we’d need separate types regardless, maybe a widely-accepted library on crates.io is not a bad end point?

Not everything must be in the standard library.

I don't think the length of the thread is a good metric

At least more than a yes or no like #8. And any solution would add complexity in type signatures even for people not relying on this feature. I do think we have a complexity budget to spend carefully.

Scott J Maddox · Answer 48 · Fri May 10 2019 10:11:34 GMT+0800 (China Standard Time)

I think we’re in agreement on this. I was saying that all handles are zero-size (e.g. you have a malloc-and-free-style allocator with global state) then this thread is not relevant. If even deallocation requires a non-zero-size handle (e.g. an allocator with multiple instances/arenas/regions that reuses freed space) then this thread is also not relevant.

But I'm saying that you're missing an important use case, if not more than one. I gave the arena allocator example in my last post. Having a separate Dealloc does potentially matter there.

Yes, using clone_in instead could be a reason the user is willing to give up on clone. But that’s not necessarily all potential users of an arena bump allocator.

No, it's not. But without a separate Dealloc no one can choose. With a separate Dealloc, everyone can choose precisely what features they need. The use cases served by having a separate Dealloc trait is a strict superset of the use cases served without a separate Dealloc trait.

Storing a full handle inside every box is almost always going to be prohibitively expensive.

I think this is an exaggeration. Many people use Vec<T> even though it has a 3× larger size_of than https://crates.io/crates/thin-vec. This extra size has a cost, but maybe that cost is not part of the bottleneck.

Okay, let me be more precise: storing a full handle inside every Box is unlikely to be chosen, assuming my BuildAlloc proposal is accepted. The extra overhead is unnecessary.

By this logic, we shouldn't do Box<T, A> at all.

Maybe! I’ve actually been considering that if we experiment outside of the rust-lang/rust repository, then we could publish that on crates.io, and people could start relying on that crate. At that point, especially if #1 proves problematic and we’d need separate types regardless, maybe a widely-accepted library on crates.io is not a bad end point?

Not everything must be in the standard library.

I do think experimenting with all of this in separate crates is a good idea. Iteration can happen much faster outside of the std lib. This would also make it much more feasible to directly compare the performance of having a separate Dealloc for arena allocators, for example.

Are there any limitations that currently prevent a full-featured custom Box type? I had played with something like this for a custom arena allocator a bit over a year ago, but ended up dropping it after a couple days.

I don't think the length of the thread is a good metric

At least more than a yes or no like #8. And any solution would add complexity in type signatures even for people not relying on this feature. I do think we have a complexity budget to spend carefully.

The added type signature complexity is definitely a concern. If separate and full-featured BoxIn, etc. types can be created in a crates.io crate, then I do think that's a better place to start.

Simon Sapin · Answer 49 · Fri May 10 2019 16:32:02 GMT+0800 (China Standard Time)

Are there any limitations that currently prevent a full-featured custom Box type?

Leaving aside features that “merely” require Nightly (e.g. implementing the CoerceUnsized trait), one feature of std::boxed::Box that is built into the language and cannot (today) be replicated by a library is moving a !Copy value out of a box. There’s some desire to eventually have a DerefMove trait, but it doesn’t exist yet.

Scott J Maddox · Answer 50 · Fri May 10 2019 22:37:38 GMT+0800 (China Standard Time)

Leaving aside features that “merely” require Nightly (e.g. implementing the CoerceUnsized trait), one feature of std::boxed::Box that is built into the language and cannot (today) be replicated by a library is moving a !Copy value out of a box. There’s some desire to eventually have a DerefMove trait, but it doesn’t exist yet.

That's a pretty big limitation...

Tim Diekmann · Answer 51 · Sat Oct 05 2019 07:41:31 GMT+0800 (China Standard Time)

trait Alloc {
    type Realloc: Realloc;
    type Dealloc: Dealloc;
    ...
}
trait Realloc {
    type Dealloc: Dealloc;
    ...
}

@scottjmaddox I don't really see the point of splitting Alloc and Realloc when Alloc requires a Realloc. Wouldn't the other way makes more sense such Realloc has an associated Alloc? Or even leave out the association at all?

Tim Diekmann · Answer 52 · Sat Oct 05 2019 07:54:52 GMT+0800 (China Standard Time)

I updated the OP. If any proposal is missing, please @TimDiekmann me 🙂

Scott J Maddox · Answer 53 · Sun Oct 06 2019 01:11:48 GMT+0800 (China Standard Time)

@scottjmaddox I don't really see the point of splitting Alloc and Realloc when Alloc requires a Realloc. Wouldn't the other way makes more sense such Realloc has an associated Alloc? Or even leave out the association at all?

You might be right. I suppose having the realloc method defined in Alloc is the same thing. And then it can have a default implementation of just alloc new, move, dealloc old. No need for trait specialization.

Tim Diekmann · Answer 54 · Sun Oct 06 2019 05:08:07 GMT+0800 (China Standard Time)

I'm currently experimenting with this: https://github.com/TimDiekmann/alloc-wg

So far I'm using three traits together, each associated with an own BuildAlloc. Currently I only tested Box, which don't need realloc at all. I think when we can decide if it is useful, when implemented RawVec.

Tim Diekmann · Answer 55 · Thu Oct 10 2019 06:21:05 GMT+0800 (China Standard Time)

I don't think splitting Realloc makes much sense here. For RawVec::reserve_internal the bounds A: Alloc + Realloc + Dealloc would be needed:

Dealloc for the type bound
Alloc for initial reserving (After creating with RawVec::new)
Realloc for further reserving

Maybe it would make sense to introduce this hierarchy:

trait Dealloc {}
trait Alloc: Dealloc {}
trait Realloc: Alloc {}

Tim Diekmann · Answer 56 · Fri Oct 11 2019 08:53:28 GMT+0800 (China Standard Time)

I actually found a use case for splitting Realloc: An allocator without Realloc cannot move memory around until deallocating.

Scott J Maddox · Answer 57 · Sat Oct 12 2019 02:56:01 GMT+0800 (China Standard Time)

Interesting point. Since we already have, Pin, though, are there any cases where you might want that? Perhaps for FFI interactions?

Tim Diekmann · Answer 58 · Sun Jan 26 2020 20:41:36 GMT+0800 (China Standard Time)

I'll try to summarize this issue.

We propose splitting up dealloc and realloc from AllocRef. Splitting the trait allows users to specify increasing constraints depending on the use case. For example it's possible to use FFI allocated memory in an allocator by only implementing DeallocRef. Leaving out implementing ReallocRef will ensure, that the returned pointer of alloc will be valid for the provided layout until dealloc was called; the memory will never move.

In the alloc-wg crate I'm using this (adopted) design:

trait DeallocRef {
    unsafe fn dealloc(&mut self, ptr: NonNull<u8>, layout: Layout);
}

trait AllocRef: DeallocRef {
    fn alloc(&mut self, layout: Layout) -> Result<NonNull<u8>, AllocErr>;

    fn alloc_zeroed(&mut self, layout: Layout) -> Result<NonNull<u8>, AllocErr> {
        // fallback to `alloc` and `write_bytes`
    }

    // ... other methods neither listed in `DeallocRef` nor `ReallocRef`
}

trait ReallocRef: AllocRef {
    fn realloc(
        &mut self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_size: usize,
    ) -> Result<NonNull<u8>, AllocErr> {
        // fallback to `alloc`, `ptr::copy_nonoverlapping`, and `dealloc`
    }
}

Further I'll discuss the reason, why I have not used associated types on AllocRef and ReallocRef instead.

The implementation for this would probably look like this:

trait DeallocRef {}

trait AllocRef {
    type DeallocRef: DeallocRef;
}

trait ReallocRef {
    type AllocRef: AllocRef;
}

The main advantage on the associated type approach is, that you could use the same DeallocRef for different AllocRefs. Also it's possible to use the same AllocRef for different ReallocRefs.

A typical call to reserve on a collection has at least one branch:

if self.is_empty() {
    alloc()
} else {
    realloc()
}

This means, that the collection needs access to potential two different allocators at once in one function. There are three ways I came up with:

Storing both allocators alongside the deallocator in the struct. This also introduces more generic parameters
Require the two allocator to be the same struct. As this is the same as the first solution, this solves nothing.
Usage of trait BuildDealloc {}, trait BuildAlloc: BuildDealloc {}, and trait BuildRealloc: BuildAlloc {} similar to #12. The collection would store one builder and the generic parameter would be the builders type. While every builder can build a DeallocRef, some builders may also build an AllocRef or ReallocRef. A huge downside of this is type inference: TimDiekmann/alloc-wg#5.

So the only viable solution would be 1., but I don't think it's worth using three parameters for allocating.

Another (IMO) minor downside of using this approach is the lack of a default implementation for realloc.

Hybrid approaches of both worlds are also possible, but the same downsides applies to them as well.

Amanieu d'Antras · Answer 59 · Mon Jan 27 2020 08:33:44 GMT+0800 (China Standard Time)

I actually found a use case for splitting Realloc: An allocator without Realloc cannot move memory around until deallocating.

I don't understand your point here. Since realloc can simply be implemented as alloc+copy+dealloc, it is irrelevant whether the allocator specifically supports it or not.

I think the AllocRef/DeallocRef split is already introducing quite a lot of complexity, adding a ReallocRef to this feels unnecessary and counterproductive.

Tim Diekmann · Answer 60 · Tue Jan 28 2020 06:37:33 GMT+0800 (China Standard Time)

I think you are right, that there a no real use cases for ReallocRef. I think the only minimal advantage would be a possible specialization, in case ReallocRef not implemented, but that doesn't outweigh by far the advantages of a third trait.

Scott J Maddox · Answer 61 · Tue Jan 28 2020 11:04:50 GMT+0800 (China Standard Time)

Can someone summarize the value of a separate Dealloc trait? The only advantage I can remember is that it would be easier to optimize away access when Dealloc is a no-op, e.g. in a bump allocator. Are there others? If not, we should probably just see if the compiler has trouble eliminating the dead code in this case. If it doesn't, there's (probably?) no need for this.

Tim Diekmann · Answer 62 · Tue Jan 28 2020 11:07:00 GMT+0800 (China Standard Time)

You can use memory, which you don't know how to allocate. For example you may use a pointer from FFI and want to deallocate it in Rust.

Scott J Maddox · Answer 63 · Tue Jan 28 2020 11:09:46 GMT+0800 (China Standard Time)

In that case, would it make more sense to just implement Drop on a wrapper type? I think that's standard practice for FFI right now.

Tim Diekmann · Answer 64 · Tue Jan 28 2020 11:10:44 GMT+0800 (China Standard Time)

If you get a pointer to an array, you could just use it in a Vec with all it's features.

Amanieu d'Antras · Answer 65 · Tue Jan 28 2020 11:11:59 GMT+0800 (China Standard Time)

Basically the idea is that Dealloc doesn't need to contain a pointer to the allocator itself, it can "derive" that pointer from the pointer passed in to the dealloc call.

The intended use case is for Box to only require Dealloc for dropping, which would avoid needing to double the size of Box.

Scott J Maddox · Answer 66 · Tue Jan 28 2020 11:19:07 GMT+0800 (China Standard Time)

In the case of Vec, am I correct in thinking that functionality would be limited, or at least different for the Dealloc-only types? For example, push would not be allowed, since it would fail when at capacity.

Lokathor · Answer 67 · Tue Jan 28 2020 11:30:20 GMT+0800 (China Standard Time)

wrapping an array pointer as a vec that knows how to dealloc but not alloc sounds like a bad time.

Similarly, boxes that do nothing on drop sounds very niche. Even a frame allocator should be ref counting the allocations made and dropped or you can't safely reset the allocator.

I'm not saying there's no possible case for this, but those two cases don't feel like they hold up

Scott J Maddox · Answer 68 · Tue Jan 28 2020 11:35:59 GMT+0800 (China Standard Time)

Similarly, boxes that do nothing on drop sounds very niche.

Bump allocators aren't that niche. They're used quite frequently when allocation is a bottleneck.

Even a frame allocator should be ref counting the allocations made and dropped or you can't safely reset the allocator.

Not if you leverage lifetimes; you can have the compiler prove that the allocator is safe to reset.

Tim Diekmann · Answer 69 · Tue Jan 28 2020 11:36:15 GMT+0800 (China Standard Time)

Many FFI APIs returns a pointer to an array. What's wrong with just putting this in a vec-like struct and use it? The collection will handle the deallocation later. The same applies to boxes.

Amanieu d'Antras · Answer 70 · Tue Jan 28 2020 11:38:29 GMT+0800 (China Standard Time)

Many FFI APIs returns a pointer to an array. What's wrong with just putting this in a vec-like struct and use it? The collection will handle the deallocation later.

I would expect such pointers to go into a boxed slice, not a vec. Basically, I see Box as the only potential use case for a separate Dealloc trait. Any complex collection will want to use the full Alloc trait.

Lokathor · Answer 71 · Tue Jan 28 2020 13:03:26 GMT+0800 (China Standard Time)

A pointer into a foreign-allocated array needs to be a slice on the rust side of things. Or something like a "slice vec" type. Or other abstraction that is designed around being aware it can't realloc the memory. A normal Vec is not that type.

Tim Diekmann · Answer 72 · Wed Jan 29 2020 18:13:54 GMT+0800 (China Standard Time)

I propose to split the AllocRef trait like the first proposal in #9 (comment).

DeallocRef is not dependent on AllocRef. With this design, it's possible to have the concept of a deleter like in C++s' std::shared_ptr This can be especially useful in FFI application, where Rust receives an allocated memory block and a dealloc function.

We could stay conservative, and only split dealloc, but as the changes will be nightly only for now, we could also split realloc and see, how people react.

The next step is review by the rest of the tagged wg members:

	split `dealloc`	split `realloc`	postpone	close
@Amanieu			😕
@Ericson2314
@glandium
@gnzlbg
@Lokathor				👎
@scottjmaddox	👍
@TimDiekmann	👍
@Wodann	👍	❤️

Please vote with

👍 split dealloc
❤️ also split realloc (implies 👍)
😕 postpone (please comment)
👎 close (please comment)

Jelte Fennema-Nio · Answer 73 · Wed Jan 29 2020 22:54:02 GMT+0800 (China Standard Time)

I think based on these comments splitting dealloc indeed makes sense for FFI, since there you can free but not allocate. Example: https://github.com/facebook/rocksdb/blob/master/include/rocksdb/c.h#L1747

Splitting realloc I aggree with @Amanieu, there's no need for it. If you don't have a realloc function in your allocater, the default implementation of malloc + memcpy should work fine.

Regarding:

A pointer into a foreign-allocated array needs to be a slice on the rust side of things. Or something like a "slice vec" type. Or other abstraction that is designed around being aware it can't realloc the memory. A normal Vec is not that type.

You can use a Box<[T]> (boxed slice) for that. Vec requires an append implementation, which I don't see a way of implementing without alloc/realloc.

Tim Diekmann · Answer 74 · Wed Jan 29 2020 23:14:11 GMT+0800 (China Standard Time)

If we decide to introduce either of the traits, this would be done in three steps:

Provide an empty trait for DeallocRef (and ReallocRef)
Update miri and the nomicon to import both traits
Actually split AllocRef

Wodann · Answer 75 · Thu Jan 30 2020 00:55:41 GMT+0800 (China Standard Time)

If we think that the ReallocRef change is going to be controversial, it might be a good idea to split it into two separate PRs; one for splitting the DeallocRef trait and one for splitting the ReallocRef trait. That way if one gets rejected/reverted, the rest of the work won't be affected.

Tim Diekmann · Answer 76 · Thu Jan 30 2020 09:19:59 GMT+0800 (China Standard Time)

I'm fine with that.

Does anyone want to write the documentation for DeallocRef and adjust AllocRef a bit?
Update: @Wodann will do this

Wodann · Answer 77 · Thu Jan 30 2020 16:52:46 GMT+0800 (China Standard Time)

Do you still want to create the actual PR to rustc? It might be good if it is the same person making PRs (for recognisability). In that case someone'd need to push/pr to your fork.

Amanieu d'Antras · Answer 78 · Thu Jan 30 2020 17:04:02 GMT+0800 (China Standard Time)

I am strongly opposed to having a separate ReallocRef trait. There is no good reason for splitting it away from AllocRef when a perfectly good default implementation needs only alloc and dealloc.

Regarding DeallocRef I am concerned that the use case for it is somewhat niche: it only really helps Box, which isn't really used that much with custom allocators (as far as I know). The use case I am currently looking at is a compiler where I use a bumpalo instance per function, so that I can quickly allocate memory for internal data structures while compiling (mainly Vec and HashMap) and discard them all once I am done compiling a function.

Lokathor · Answer 79 · Thu Jan 30 2020 17:38:36 GMT+0800 (China Standard Time)

Just leak the box and keep the &mut around however long you would have kept the box around

Tim Diekmann · Answer 80 · Thu Jan 30 2020 20:06:59 GMT+0800 (China Standard Time)

@Wodann

Do you still want to create the actual PR to rustc? It might be good if it is the same person making PRs (for recognisability).

Yes, I think that makes sense.

In that case someone'd need to push/pr to your fork.

One could also make a PR to alloc-wg, or post the docs here, in zulip, or send it to me via EMail or PM in Discord 🙂

@Amanieu @Lokathor Could you also give a vote?

Lokathor · Answer 81 · Fri Jan 31 2020 02:40:46 GMT+0800 (China Standard Time)

i vote this idea is just way too niche

Scott J Maddox · Answer 82 · Fri Jan 31 2020 08:03:35 GMT+0800 (China Standard Time)

Regarding DeallocRef I am concerned that the use case for it is somewhat niche: it only really helps Box, which isn't really used that much with custom allocators (as far as I know).

That's one of the things this working group is trying to fix, though. It would be much better if Rust std lib collections had first-class support for custom allocators.

Tim Diekmann · Answer 83 · Sun Feb 02 2020 21:59:08 GMT+0800 (China Standard Time)

Can I assume that the majority voted for splitting dealloc (not realloc!) and I can go ahead? While @JelteF and @stevenlr are not tagged, I don't want to ignore those votes as we don't have a fixed membership.

Steven Le Rouzic · Answer 84 · Mon Feb 03 2020 00:43:51 GMT+0800 (China Standard Time)

I originally voted to also split realloc for consistency but @Amanieu's arguments had me change my mind. Updated my vote above, thanks for taking it into account. :)

Tim Diekmann · Answer 85 · Wed Feb 12 2020 22:09:30 GMT+0800 (China Standard Time)

Whops, wrong Issue... Sorry for the noise

Tim Diekmann · Answer 86 · Wed Aug 05 2020 17:44:39 GMT+0800 (China Standard Time)

Will close this as we probably won't implement this. If we will do, we can still reopen it (again).