jeffparsons / rangemap

Map data structure whose keys are stored as ranges

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Thoughts on coalescing and mutability

Rua opened this issue · comments

commented

At the moment the only way to change the values in the map is to insert new ones. There is no way to update existing ones. I understand that this has to do with the coalescing behaviour; if you change something, you have to check afterwards if it can be merged with an adjacent range. But having no mutable operations also makes some tasks really cumbersome.

In my particular case, I want to make a specific change to all items within a range, filling in gaps and splitting existing ranges at either end if necessary. To do that currently, I have to clone the items from the specified range, store them somewhere else, change them, then insert them back in. That temporary storage also needs an allocation, which is something I'd rather avoid as this is performance sensitive code.

Range coalescing can be useful for speed, but in my case above with a mutation-heavy workload, that speed benefit may well be lost. It also has the drawback that values need to be Eq, and comparing values may be costly. I therefore propose an alternative:

  • No coalescing is done anymore. Two touching ranges can have the same value. Eq is no longer needed.
  • Mutable operations can now be added since there is no invariant to uphold.
  • Add a split method to split a range in two at a given key.
  • Add a merge method that does the coalescing manually. This method would have an Eq bound on the value.

This would give the user control over the coalescing behaviour, and they can decide to use it or not, or use it less frequently. It would also help with my case; I could now call split at the ends of the range, then call range_mut (to be written) to mutate the ranges in place.

I'm willing to implement these changes if wanted.

Hi @Rua,

Thanks for getting in touch, and sorry for not replying to this for (checks) nearly a year. 😐

As I just mentioned over on this other issue, I'm starting to consider that maybe I should support some option(s) for disabling/suspending coalescing. I'm wary of imposing a burden on users of the API that do want automatic coalescing, though (I still consider that as the primary use case this crate is meant to serve), so a few options I'm considering are:

  • A mode (either disable_coalescing or suspend_coalescing(|...| { ... })) for turning off coalescing, where turning it back on again implies a full pass over the collection to re-coalesce entries.
  • Similar to the above, but track the changes you make and do a more targeted fix-up at the end.
  • A lower-level collection that gives you more direct control, upon which the current always-coalescing one would be rebuilt on top of. But... the question then is what value does this collection provide over a plain old BTreeMap? Does it truncate/split existing ranges when you insert? Do all users want that?
  • Conversion to/from the underlying BTreeMap, letting you do whatever you want. I think this would imply either exposing more of the ugly guts of the crate (the RangeStartWrapper/RangeEndWrapper noise) or some very careful unsafe transmutes so the BTreeMap the user is given just has bare Ranges in it.

Most of this would be a lot easier when BTreeMap cursors land, and some of it is probably too slow to seriously consider without them (e.g. doing a pass over the whole collection when you're done making modifications). So I'm tempted to wait, but who knows how long until cursors are stable? It could be a long time, even if they prove to be relatively uncontroversial.

Do you have any thoughts on these options, or any other ideas since you first raised this? Maybe you've moved on since then and have no interest in this stuff anymore? 😅