map::tree_bins::concurrent_tree_bin: attempt to subtract with overflow
jonhoo opened this issue · comments
Hit two of these at the same time. This is after #85.
test map::tree_bins::concurrent_tree_bin ...
thread '<unnamed>' panicked at 'attempt to subtract with overflow', src/map.rs:1163:17
stack backtrace:
...
13: core::panicking::panic
at src/libcore/panicking.rs:54
14: flurry::map::HashMap<K,V,S>::add_count
at src/map.rs:1163
15: flurry::map::HashMap<K,V,S>::replace_node
at src/map.rs:2626
16: flurry::map::HashMap<K,V,S>::remove
at src/map.rs:2366
17: flurry::map::tree_bins::concurrent_tree_bin::{{closure}}
at src/map.rs:3429
thread '<unnamed>' panicked at 'attempt to add with overflow', src/map.rs:1159:17
...
13: core::panicking::panic
at src/libcore/panicking.rs:54
14: flurry::map::HashMap<K,V,S>::add_count
at src/map.rs:1159
15: flurry::map::HashMap<K,V,S>::put
at src/map.rs:1970
16: flurry::map::HashMap<K,V,S>::insert
at src/map.rs:1625
17: flurry::map::tree_bins::concurrent_tree_bin::{{closure}}
at src/map.rs:3419
Current best guess for the order of events would be as follows:
- We have a (regular) bin with 1 element, which is the only element in the map (
count == 1
). - That element gets removed by thread 1, which is paused before the call to
add_count
. - An element for the same key is inserted by thread 2, which also is paused before
add_count
(note that all the count updates happen outside of the respective critical sections of the corresponding methods). - Thread 3 now removes this element again, and decrements
count
to0
. - Thread 1 gets to run again and decrements
count
tousize::MAX
. - Thread 2 gets to run and increments
count
to0
.
As of yet unsure as to why this is be a problem for us, but wouldn't be for the Java implementation. There is a validated
boolean in Java's replaceNode
which is omitted in our implementation due to match
/continue
, but I don't see how that would be the culprit.
The Java implementation has all the size information as long
, it's possible that they just
allow this. See also their implementation of size
, which essentially clamps the actual computed value to between 0
and Integer.MAX_INT
. The tree bin test may just be the first to trigger this for us.
It is also possible that the shared counters we don't yet have influence this, there seems to be some kind of contention detection there. There is also this annotation on the counter cells. But I think it is still possible for the computed value to be negative upon call to size
, and that they use long
just so they can perform bounds checks on int
.
That's fascinating... I mean, I suppose we could just move it to an AtomicIsize
instead... I guess they decided it wasn't worth the cost to keep the count accurate at all times. Mind sending a PR?
Sure. I'll put together a PR for this and one for #83 when I'm back home. Should work out to fit in tomorrow.
I can confirm that this was fixed by #88 (perhaps unsurprisingly) after having run it in a loop for a while.