abseil / abseil-cpp

Abseil Common Libraries (C++)

Home Page:https://abseil.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistency between find and traversing with iterators

gabrieltanase42 opened this issue · comments

Hi all;
I recently replaced the hash tables in my code with abseil hash. I notice the following abnormal behavior in some of my tests. I look up for an element with find and it is saying is not there then I print all key values in the container and I see the element being actually present.

      typedef absl::flat_hash_map<H/*hashCode*/, I/*indexIntoColumn*/ 
           , absl::container_internal::hash_default_hash<H> 
           , absl::container_internal::hash_default_eq<H> 
           , Allocator<std::pair<const H, I>> // Our own Allocator
           > map_t;

        auto hit = hashIndexA.map.find(hashCode);
        if (hit == hashIndexA.map.end()) {
            cout<<&hashIndexA<<&(hashIndexA.map)<<"::element not found:"<<hashCode<<"\n";
            for (auto mit = hashIndexA.begin(); mit != hashIndexA.end();++mit)
                cout<<"HASHINDEX:"<<mit->first<<"::"<<mit->second<<"\n";
        }

sample output:
0xffff62a2d0d00xffff62a2d128::element not found:-1574725910
HASHINDEX:-1574725910::2
HASHINDEX:-1574733349::5

This is with C++11, Abseil-lts_2020_09_23 and the only maybe special thing here is that the map is created in one thread and potentially read in a separate thread . There is no concurrent access. The table is completely created by one thread and later on the other thread is is reading the data.

Maybe by luck , when replacing the default hashing function with std::hash my tests are passing.
Also the tests always pass in debug mode. The issue is only present in -g -O2 mode.

I was wondering if anybody else hit this issue in the past and if there are any flags that can be set ?

Are there any thread local storage issues that may affect the behavior in my scenario ?

Unfortunately I don;t have an individual reproducer. For me these are hundred of complex tests doing joins, distinct, group by in the context of a database.

@tituswinters I updated the original posting with the table definition; I tried to used the defaults except for allocator which is our own custom thing. Also in your case I suspect I would see the failure all the time including debug mode? right ?

For us the key is a signed int32 and the value is also a signed int 32

Is it posible when I insert the values in thread 1 to insert with a certain hash value let's say for -1574725910 and then in the other thread when I look up for -1574725910 it may not find it as the hash for the key may be different than on the thread that inserted ?

Are you using dynamic loading? If so, see #834.

So I was able to write a little wrapper hash functor that will call the abseil hasher , print the value and return;
It seems indeed that for key = (int32) 32 it generates different hashes when inserting versu when reading.

// when inserting keys 32, 33, 34, 35
THID:139985095190272::HASH of (32) is :584646051168367060
THID:139985095190272::HASH of (32) is :584646051168367060
THID:139985095190272::HASH of (32) is :584646051168367060
THID:139985095190272::HASH of (33) is :11960714558955734653
THID:139985095190272::HASH of (33) is :11960714558955734653
THID:139985095190272::HASH of (34) is :4890038993052113671
THID:139985095190272::HASH of (34) is :4890038993052113671
THID:139985095190272::HASH of (34) is :4890038993052113671
THID:139985095190272::HASH of (35) is :16266107500834990510
THID:139985095190272::HASH of (35) is :16266107500834990510

here is right before lookup for 32
THID:139985095190272::HASH of (32) is :10263475499543341126
0x7f50a8a3b0d0:0x7f50a8a3b128 not found 32

Do I have any other choices to configure this besides using std::hash ?

And indeed my issue seems very similar with #834

template <class Key>
          struct myhash{
              absl::container_internal::hash_default_hash<Key> hasher;
              typedef Key argument_type;
              typedef size_t result_type;
              size_t operator()(const Key& k) const {
                  size_t hval = hasher(k);
                  cout<<"THID:"<<std::this_thread::get_id()<<"::HASH of ("<<k<<") is :"<<hval<<"\n";
                  return hval;
              }
          };

(I work on the same team as Gabriel.) Would it be possible either to replace kSeed with a compile-time constant or remove it altogether? It seems strange to me to accept "dynamic loading just doesn't work" (per #834), just to "prevent having users depend on the particular hash values."

After all, multiple processes accessing the same hash table in shared memory, as well as processes using multiple .so's, depend on the particular hash values! The whole point of a hash function is that, for the hash's lifetime, passing the same input to the same hash function yields the same hash. I find it weird to break this fundamental requirement (for multi-process programs, as well as single-process programs implemented across multiple .so's) just so users can't depend on it...

Also, since kSeed is initialized to &kSeed, it's only random (as the comments point out) on systems that provide ASLR... and there, the randomness is limited to the ASLR range. (For example, since CPUs currently don't use all 64 bits for virtual memory address, the high bits of kSeed would always be zero...) The comments suggest that rolling a "real" random number would be too computationally expensive, but that makes no sense to me since kSeed is initialized statically.

For all these reasons, I propose either eliminating kSeed altogether, or else initializing it to a compile-time constant. Hash tables in shared memory are a common use case; and hash tables shared across multiple .so's seems like a reasonable use case as well. It feels wrong to reject both use cases just to prevent users from assuming that hash values won't change.

Would it be possible either to replace kSeed with a compile-time constant or remove it altogether?

Sorry, but no.