Control over children ordering

Question

Control over children ordering

Drakulix opened this issue 8 years ago · comments

Victoria Brekenfeld commented 8 years ago

I am wondering if any guarantees on the order of children would be a goal or non-goal of this library?

It seems insert_with_parent currently only pushes to the end of the children list.
If I would like to replace a child and keep the current position, this does not seem to be easily possible and I don't even know, if I can rely on insert_with_parent always appending at the end, because this is not documented.

For my use case I just have a left and a right child and it would be great to be able to differentiate between those. But I have no good idea, how to provide this functionality in the context of this library with an arbitrary amount of child nodes.

Possible API suggestions:

Split insert into two functions push at the front and back of the children respectively.
- Not a very nice API in my opinion. The use case is not very obvious at first.
- Actual sorting would be difficult to emulate given only these functions
- Enough for my use case
- Easy to implement
Allow to sorting of children via a closure
- Makes a good API
- Can be used to reorder children, after "replacing" (remove/insert), but needs to track old positions manually
- That means not very nice for my use case
- Easy to implement (https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.sort_by_key)
Allow swapping of children
- Also not a very nice API. The use case is not very obvious at first.
- Actual sorting would be difficult to emulate given only these functions
- Enough for my use case
- Easy to implement
Allow replacement of Nodes and swapping of Nodes.
- Swapping could take two NodeIds
- Replacing could insert a new node, removing the old in the process.
- Still does not make very strong order guarantees except to keep the old order
- Feels like it has use cases beyond this

I mostly opened this issue to discuss, if you would want to support any order guarantees anyway. The rest is just a little bit of brainstorming, please feel free to edit/extend/ignore it, however you like.

Ian Burns · Answer 1 · Wed Dec 14 2016 23:22:08 GMT+0800 (China Standard Time)

Thanks again for submitting an issue, all of this is very much appreciated!

You are absolutely correct in saying that insert_with_parent only ever pushes to the end of the child array. However, this is just the way I happened to implement the system, so I didn't originally mean for there to be any guarantees there.

That's not to say that I'm against having any guarantees about child ordering, it's just that I haven't made any explicit efforts to guarantee any specific behavior so far. As you said, this is not documented anywhere, so that should probably be fixed at some point (hopefully it will be fixed shortly after we hash this out because I think this is a discussion that is definitely needed).

Right now I'm leaning towards having insert_with_parent's behavior stay the way it is and documenting the fact that new child Nodes will always be inserted "after" existing children. I think this behavior is nice for several reasons (some of which may be debatable):

I think this behavior is what most people would expect from such a function
It should be fast (assuming that the underlying Vec doesn't need to re-allocate space)
It's a pretty simple implementation

Basically, I want the ordering to be guaranteed not to change unless the caller explicitly asks for it to change. This is similar to how the caller shouldn't need to worry about NodeIds becoming invalid unless the caller explicitly clones a NodeId (and then proceeds to remove the corresponding Node from the Tree).

With that in mind, I agree it would be very nice to be able to sort the children of a Node, so I'm thinking we definitely need something like sort_children_by_key/sort_children_by (maybe both?). I did have one question on this one though: could you clarify what you mean by '...but needs to track old positions manually'?

I do also think it would be very nice to have a replace function and a swap function as I can imagine those could be very useful in certain scenarios.

What do you think about the above approach? I know I basically just responded with "I like all of those, lets do all of them", but I think most of those functions are things that people would expect from a tree library, so they'll be nice to have.

Victoria Brekenfeld · Answer 2 · Thu Dec 15 2016 01:43:58 GMT+0800 (China Standard Time)

With that in mind, I agree it would be very nice to be able to sort the children of a Node, so I'm thinking we definitely need something like sort_children_by_key/sort_children_by (maybe both?). I did have one question on this one though: could you clarify what you mean by '...but needs to track old positions manually'?

That is just relevant to my use case. If you want to keep the insertion order and replace a child with just the sort functions available, you would have to remember the insertion order prior to making the modifications to restore it later. A real replace function would be a much better solution.

Alright, I am going to implement replace and swap, as well as both sort methods in the next days, maybe even today. I don't think there is anything wrong with exposing both functions.

Ian Burns · Answer 3 · Thu Dec 15 2016 02:25:07 GMT+0800 (China Standard Time)

If you want to keep the insertion order and replace a child with just the sort functions available, you would have to remember the insertion order prior to making the modifications to restore it later. A real replace function would be a much better solution.

Ah, I gotcha, that makes sense. And yes, I agree, a real replace function would be better.

Alright, I am going to implement replace and swap, as well as both sort methods in the next days, maybe even today. I don't think there is anything wrong with exposing both functions.

Awesome, I appreciate it! I think those will be great additions!

I would like to request that you make sure that both parent and children values get cleared out on the Node that is removed during replace. Just to help make sure that they don't live longer than they should on accident.

If any questions/concerns come up while you're working on those please feel free to ask.

Victoria Brekenfeld · Answer 4 · Thu Dec 15 2016 02:30:04 GMT+0800 (China Standard Time)

I would like to request that you make sure that both parent and children values get cleared out on the Node that is removed during replace. Just to help make sure that they don't live longer than they should on accident.

Sure. I will use the existing functions as a reference.

Victoria Brekenfeld · Answer 5 · Thu Dec 15 2016 05:02:55 GMT+0800 (China Standard Time)

I am running into some problems implementing sort_by or sort_by_key.

I cannot directly use self.children_mut().sort_by(f), because this only returns NodeIds. But if you can call sort on a Node, you must have acquired this node by using tree.get_mut(), which means the Tree is already borrow mutably and you cannot use it inside the closure to get a Node for the NodeId passed onto you.

I am currently thinking the only way to workaround this, is to implement the sorting methods on Tree directly.
Do you have an opinion on how to implement this?

Ian Burns · Answer 6 · Thu Dec 15 2016 05:37:33 GMT+0800 (China Standard Time)

My first thought is to add a method like this to Tree:

    pub fn sort_children_by_data(&mut self, node_id: &NodeId) -> Result<(), NodeIdError> where T: Ord {
        let (is_valid, error) = self.is_valid_node_id(node_id);
        if !is_valid {
            return Result::Err(error.expect("Tree::sort_children_by_data: Missing an error value on finding an invalid NodeId."));
        }

        let mut children = self.get_unsafe(node_id).children().clone();

        children.sort_by_key(|a| {
            self.get_unsafe(a).data()
        });

        //set_children is a new (private) method.
        self.get_mut_unsafe(node_id).set_children(children);

        Result::Ok(())
    }

Notes on this:

Sadly, this approach requires a clone.
We would need a new method on the MutableNode Trait called set_children for this approach.
Not sure how hard it is to allow a custom closure to be passed in, but this one is obviously hard-coded to compare Node::data() directly which requires T: Ord.

Again, this is just my first thought on how I would have done it, but there may be a better way to go about it.

Any thoughts on this? Does that help at all?

EDIT: I did run that and it does type-check properly (when I removed the call to set_children since that doesn't exist yet).

Victoria Brekenfeld · Answer 7 · Thu Dec 15 2016 05:44:49 GMT+0800 (China Standard Time)

Sure it does.
Passing a closure should not be a huge problem.

Any reason, why you don't use children_mut() inside this function, now that there is the possibility of mutable access to the node? That would mean we need no clone and no set_children.

I will add those methods to the tree directly then and try to work around the cloning as described.

Ian Burns · Answer 8 · Thu Dec 15 2016 05:57:07 GMT+0800 (China Standard Time)

Well, originally I had this:

pub fn sort_children_by_data(&mut self, node_id: &NodeId) -> Result<(), NodeIdError> where T: Ord {
        let (is_valid, error) = self.is_valid_node_id(node_id);
        if !is_valid {
            return Result::Err(error.expect("Tree::move_node_to_parent: Missing an error value on finding an invalid NodeId."));
        }

        let mut children = self.get_mut_unsafe(node_id).children_mut();

        children.sort_by_key(|a| {
            self.get_unsafe(a).data()
        });

        Result::Ok(())
    }

and I got this error:

error[E0502]: cannot borrow `self` as immutable because `*self` is also borrowed as mutable
   --> src\tree.rs:455:30
    |
453 |         let mut children = self.get_mut_unsafe(node_id).children_mut();
    |                            ---- mutable borrow occurs here
454 | 
455 |         children.sort_by_key(|a| {
    |                              ^^^ immutable borrow occurs here
456 |             self.get_unsafe(a).data()
    |             ---- borrow occurs due to use of `self` in closure
...
460 |     }
    |     - mutable borrow ends here

So that's why I opted for the immutable borrow on the node (note get_mut_unsafe vs get_unsafe) + clone the children idea.

But maybe there's another way to approach it that I'm just not seeing right now.

Victoria Brekenfeld · Answer 9 · Thu Dec 15 2016 06:05:39 GMT+0800 (China Standard Time)

Just ran into this problem as well.
This is a little frustrating, because we know, that no children will be the same as node_id.

There is no easy way to work around this. self.nodes.split(n) might be used to get those separately without cloning, but I think for the readability of the source code, we should go with clone instead.

Ian Burns · Answer 10 · Thu Dec 15 2016 06:43:53 GMT+0800 (China Standard Time)

Yeah, I agree it's pretty frustrating.

But I think you're right though, going with the clone approach is probably best.

Victoria Brekenfeld · Answer 11 · Thu Dec 15 2016 06:48:32 GMT+0800 (China Standard Time)

Just a though:

We could add another method to MutableNode called take_children, that uses mem::swap to exchange the children with an empty Vec, which could be initialized using Vec::with_capaticy(0) or even mem::uninitialized, if you would want to go with unsafe code.
That way we could manipulate the children directly without keeping a mutable reference to the node by taking ownership.

An empty Vec has probably a smaller performance impact than the cloning, but it lowers the readability of the code.

Ian Burns · Answer 12 · Thu Dec 15 2016 07:14:18 GMT+0800 (China Standard Time)

To be honest I try to avoid unsafe wherever possible, but I'm sure you've seen that I've already used unsafe a few times to avoid bounds checking (since we're doing the bounds checking already).

With that said, (and if I'm following your line of logic properly) I think this is probably a good idea.

Just to make sure I'm following you on this:

add set_children to MutableNode
add take_children to MutableNode
in sort_by_[whatever] call take_children, sort the vec, and then call set_children

Is that correct?

Also, from the docs here:

In particular, if you construct a Vec with capacity 0 via Vec::new(), vec![], Vec::with_capacity(0), or by calling shrink_to_fit() on an empty Vec, it will not allocate memory.

So mem::uninitialized might be overkill in this case?

Victoria Brekenfeld · Answer 13 · Thu Dec 15 2016 07:18:51 GMT+0800 (China Standard Time)

Yes that is correct.
And that indeed sounds like an overkill in that case. I was trying to avoid a heap allocation of Vec, but if it does none in the first place mem::uninitialized just skips the initializer, which should not have a huge impact on performance.

Ian Burns · Answer 14 · Thu Dec 15 2016 07:26:55 GMT+0800 (China Standard Time)

Honestly I think this is the best direction to go here.

~~I have no problem with your solutions in #5 , but I would love to avoid the clone if possible.~~
You beat me to it haha! I'll comment more in the PR.