Expose non-replacing `insert`

Question

Expose non-replacing `insert`

jonhoo opened this issue 4 years ago · comments

By calling put with no_replacement = true, we can provide a version of insert that does not update the value if the key is already present. I can imagine that being a relatively useful method to provide. I'm not sure what to call it though? std::collections::HashMap does not have an equivalent, so we're on our own here.

llitz · Answer 1 · Tue Feb 04 2020 16:24:27 GMT+0800 (China Standard Time)

noreplace comes to mind but that's too cheesy

Perhaps insertifnull

DQ · Answer 2 · Tue Feb 04 2020 18:20:13 GMT+0800 (China Standard Time)

A lot of the APIs which deal with concurrency use try_ for methods that depend on state shared between multiple threads. For example, a lot of the locks have a try_lock() method which returns false if the lock is already held by another thread. Such methods can also be found in places where an operation might not succeed, such as try_from for fallible conversions.

As I assume we have a similar use case for the proposed method (i.e. inserting if no value for the given key is currently stored by a different thread) and it is definitely a case of an operation that might not happen depending on the state of the map, I suggest following this naming scheme and calling the method try_insert.

Edit: I also just think the name would be very intuitive ^^

Jon Gjengset · Answer 3 · Tue Feb 04 2020 23:06:53 GMT+0800 (China Standard Time)

Oh, wow, yes, try_insert is a great name!

llitz · Answer 4 · Tue Feb 04 2020 23:27:26 GMT+0800 (China Standard Time)

try_insert feels weird compared to the other APIs. For example, we do a try_lock() and it fails, the lock wasn't acquired. Now do a try_insert(), the insert failed but the data is there anyway.

It feels the wrong name to call based on what I usually expect from try, something that can fail. If the data is already there, is it a failure?

DQ · Answer 5 · Tue Feb 04 2020 23:35:24 GMT+0800 (China Standard Time)

The proposal only mentions checking if an entry already exists for the given key. The map may still contain a value different from what you try to insert, so in this case you "failed" to store your provided value in the map. So while there might be data in the map after a such a failed try_insert, it is not generally the data you wanted it to contain. That's how I think of this at least.

We can also discuss whether we should check the already stored value against the given value and what to return in case they match.

llitz · Answer 6 · Tue Feb 04 2020 23:40:23 GMT+0800 (China Standard Time)

Hmmm you are right it makes a lot of sense to use try_insert from considering something else can be there.

Checking the value may have a high cost, but could be necessary to decide what to return.

Jon Gjengset · Answer 7 · Wed Feb 05 2020 00:12:31 GMT+0800 (China Standard Time)

I don't think we should check for equality of the value.

One alternative is to rename the current insert to upsert, which is a short-form often used for "update or insert", and then call this method insert. But that's probably more confusing, since we then would not match the std API.

DQ · Answer 8 · Wed Feb 05 2020 00:26:59 GMT+0800 (China Standard Time)

I agree that checking for the value is unnecessary. I do think however that this method should return not only information about whether the insert succeeded, but also the previous value in case of failure (which is returned by put anyway). That way, the caller can check himself whether the value blocking his insert matches his value or not if he requires this.

So we could simply propagate the return of put like insert does (in which case the caller knows he was successful if he reads None as previous value), or we could use something like a Result to indicate success. While the Option contains all relevant information, I personally feel it is confusing to obtain None on success and Some on failure, so I'd lean towards the second alternative.

Jon Gjengset · Answer 9 · Wed Feb 05 2020 00:31:06 GMT+0800 (China Standard Time)

I agree, returning a Result with the old value in Err seems like a good solution.

Josh Stone · Answer 10 · Sat Feb 08 2020 01:35:45 GMT+0800 (China Standard Time)

std::collections::HashMap does not have an equivalent, so we're on our own here.

I think you would use map.entry(key).or_insert(value).

Jon Gjengset · Answer 11 · Sat Feb 08 2020 02:12:49 GMT+0800 (China Standard Time)

Sorry, yes, I meant as a free-standing method. The Entry API would be great to support, but we have other issues there sadly.

Josh Stone · Answer 12 · Sat Feb 08 2020 02:23:33 GMT+0800 (China Standard Time)

Then what about insert_if_absent as proposed in #12 (comment)?
The implementation could just be map.compute_if_absent(key, || value, guard).

Jon Gjengset · Answer 13 · Sat Feb 08 2020 02:58:45 GMT+0800 (China Standard Time)

I think the implementation is just

self.put(key, value, true, guard)

We don't currently have compute_if_absent (only compute_if_present), but yes, that would also work.

Between insert_if_absent and try_insert, I'm not sure which one I prefer to be honest. I'd be fine with either I think. I'm not a huge fan of the *_if_absent + *_if_present naming, but it does match ConcurrentHashMap, so 🤷‍♂️

Gark Garcia · Answer 14 · Mon Mar 23 2020 05:46:55 GMT+0800 (China Standard Time)

@jonhoo I'm working on this right now. What exactly does HashMap::put returns?

I think naming the new method try_insert and returning a Result<V, V> or Result<(), V> would be a better approach. Anyway, I'll go with whatever you guys decide.

Gark Garcia · Answer 15 · Mon Mar 23 2020 06:32:51 GMT+0800 (China Standard Time)

I believe Result<&T, &T> is the ideal return type for this method. That way map.try_insert(key, val, guard) would return Ok(&val) if map doesn't have an entry for key, and Err(&old_value) otherwise.

However, val is consumed by HashMap::put before the function returns, so returning a reference to it would require some extra work. So maybe we should just return a Result<(), &T> (to match std's try_whatever type signature)?

Jon Gjengset · Answer 16 · Mon Mar 23 2020 20:46:04 GMT+0800 (China Standard Time)

I don't think we want to make HashMap::put return Result, since Err(old_value) would imply that the put did not happen, when that is not the case for put. I do think it makes sense for Result to be the return type of try_insert (if that's what we call it), since there Err() there really does mean that the operation didn't do anything. May be worth adding a new (internal-only) enum to provide a more fine-grained return type for HashMap::put — that way we could also have a NotReplaced(T) variant to use when no_replacement = true, which gives back the T.

I'm not entirely sure what the return semantics of HashMap::try_insert should be. My instinct is that the Ok type should be (), and that the Err type should be T; the value that was not inserted. Returning &T with the current value (which caused the try_insert to fail) also makes sense to me though. Perhaps what we really want is

struct TryInsertError {
    current: &T,
    new: T,
}

The names here are inspired by crossbeam_epoch::CompareAndSetError.

DQ · Answer 17 · Mon Apr 13 2020 17:30:31 GMT+0800 (China Standard Time)

This seems to be covered now with #74. Closing the issue.