Tessil / robin-map

C++ implementation of a fast hash map and hash set using robin hood hashing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maybe it would make sense for the self-insertion/emplacement to work (suggestion)

Predelnik opened this issue · comments

Currently the following code leads to crash/undefined behavior with tsl::robin_map:

int main ()
{
  tsl::robin_map<int, std::string> v;
  v[0] = "23";
  for (int i = 2; i < 3300; ++i)
    v[i] = v[0];
}

It happens because we pass the reference to our own element which gets destroyed during reallocation and since construction happens late it leads to the unfortunate results. I understand that this might be a design decision to treat it as UB however I have the following reason for suggesting to fix this:

  1. It works fine with node-based (std::) maps.

  2. It works with std::vector. I can't quote standard paragraph exactly but here's link to the statement from Microsoft of fixing this problem in their implementation https://devblogs.microsoft.com/cppblog/stl-fixes-in-vs-2017-rtm/ (Fixed aliasing bugs)

  3. It actually seems to be much more frequent operation for unordered map rather than for vector, simply copying a past value to the newly created field feels like natural operation in many cases.

  4. A fix seems to be the same as for vector since it is the structure robin map built upon so should not be that hard to implement.

Hi,

Thank you for the suggestion.

It's effectively an undefined behaviour. As specified in the documentation, any operation modifying the hash map may invalidate the iterators and references to the objects in the hash map. The v[i] operation may thus invalidate the v[0] parameter that it gets by reference.

I didn't know that v.push_back(v[0]) with an std::vector was allowed when v.capacity() == v.size(), I thought it was an undefined behaviour.

I have to see how this can be done as there are two problems:

  • A rehash may occur invalidating all the references to the elements in the hash map
  • A element in the hash map may change of place (and thus invalidating any reference to the element) due to the robin hood hashing mechanism. The inserted element may "steal" the bucket of an element and we then have to move the element that was in the bucket further away.

I have to think a bit if it can be done and how this can be done efficiently and come back to you in a couple of days.

For the second point I can just check if the element in the bucket I'm stealing hasn't the same address as the one I want to insert.

You can see the fix in the link:

Fixed aliasing bugs. For example, the Standard permits v.emplace_back(v[0]), which we were mishandling at runtime, and v.push_back(v[0]), which we were guarding against with deficient code (asking “does this object live within our memory block?” doesn’t work in general). The fix involves performing our actions in a careful order, so we don’t invalidate whatever we’ve been given. Occasionally, to defend against aliasing, we must construct an element on the stack, which we do only when there’s no other choice (e.g. emplace(), with sufficient capacity, not at the end).

So, if you know invalidation is going to happen, construct the element in the stack first, do the invalidation, then move the object from the stack into place.

Thank you, I checked a bit further the issue and it seems to be effectively possible without too much troubles.

I'll be a bit busy the coming week but I'll check to implement the fix when I can find some time.

Hello,

Working a bit on the issue I realized it's not really possible for the following code to work with an open-adressing implementation:

int main ()
{
  tsl::robin_map<int, std::string> v;
  v[0] = "23";
  for (int i = 2; i < 3300; ++i)
    v[i] = v[0];
}

In the loop, the v[0] expression is executed first and returns a reference to an element inside the bucket array. Afterwards the v[i] expression is executed which may incur a rehash of the bucket array without being aware of v[0]. If it happens, the previous reference we got with v[0] is no more valid and the assignment operation will be undefined.

The best that can be done is to make the following code valid by being careful on rehash:

int main ()
{
  tsl::robin_map<int, std::string> v;
  v[0] = "23";
  for (int i = 2; i < 3300; ++i)
    v.try_emplace(i, v[0]);
}

Sorry for late response but I do think that v.try_emplace(i, v[0]); is enough actually, since it will be sort of analogous to v.push_back(v[0]) for vector. While original case might still lead to surprising results for some people, I guess it's similar to possible troubles due to instability of references in case of robin_map versus standard maps.