tidwall / hashmap

A simple and efficient hashmap package for Go. Open addressing, robin hood hashing, and xxh3 algorithm. Supports generics.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

potential infinite loop in Get

Yiling-J opened this issue · comments

First thanks for this greate package! I'm adding it to my cache package because it's faster, but some of my tests hang forever on Get, so I take a look the source code:

Currently the Get method use a for loop without condition:

hashmap/map.go

Line 159 in a89df9d

for {

I think this may cause an infinite loop if it can't find a bucket match given key.

I also take a look some other implentmentions, for example this one:

https://github.com/goossaert/hashmap/blob/17174253474d140fcdec6d43f656ed998a824320/backshift_hashmap.cc#L38

This hashmap is designed to always have 15% of it's buckets available.
Which means there should be multiple buckets where the dib is zero, ensuring that loop will always terminate at some point.

for {
	if m.buckets[i].dib() == 0 {
		return value, false
	}
	if m.buckets[i].hash() == hash && m.buckets[i].key == key {
		return m.buckets[i].value, true
	}
	i = (i + 1) & m.mask
}

It's certainly possible to add a probemax, equal to the number of buckets, as a loop condition. This may fix the problem you are running into. But, I'm wondering why there are no empty buckets in your case? The Set operation is suppose to trigger a resize when hashmap goes above 85%.

I would like to find root cause.

Would it be possible for you to provide a reproducible example that causes this issue?

This hashmap is designed to always have 15% of it's buckets available. Which means there should be multiple buckets where the dib is zero, ensuring that loop will always terminate at some point.

for {
	if m.buckets[i].dib() == 0 {
		return value, false
	}
	if m.buckets[i].hash() == hash && m.buckets[i].key == key {
		return m.buckets[i].value, true
	}
	i = (i + 1) & m.mask
}

It's certainly possible to add a probemax, equal to the number of buckets, as a loop condition. This may fix the problem you are running into. But, I'm wondering why there are no empty buckets in your case? The Set operation is suppose to trigger a resize when hashmap goes above 85%.

I would like to find root cause.

Would it be possible for you to provide a reproducible example that causes this issue?

It's a little hard to reproduce it using a simple test, so I record some data:

func (m *Map[K, V]) Get(key K) (value V, ok bool) {
	if len(m.buckets) == 0 {
		return value, false
	}
	hash := m.hash(key)
	i := hash & m.mask
	fmt.Println("start loop", m.cap, m.length)
	fmt.Println("get", key, i)
	defer fmt.Println("finish loop")
	for {
		fmt.Println(m.buckets[i].key, m.buckets[i].dib())
		if m.buckets[i].dib() == 0 {
			return value, false
		}
		if m.buckets[i].hash() == hash && m.buckets[i].key == key {
			return m.buckets[i].value, true
		}
		i = (i + 1) & m.mask
	}
}
start loop 50 8
get key:3810 4
key:564 3
key:109645 1
key:133049 2
key:1868412 2
key:30196093 3
key:2244564 3
key:24 4
key:1826479 5
key:564 3
key:109645 1
key:133049 2
key:1868412 2
key:30196093 3
key:2244564 3
key:24 4
key:1826479 5
key:564 3
key:109645 1
key:133049 2
key:1868412 2
key:30196093 3
key:2244564 3
key:24 4
key:1826479 5
key:564 3
key:109645 1
key:133049 2
key:1868412 2
key:30196093 3
key:2244564 3
key:24 4
key:1826479 5
key:564 3
key:109645 1
key:133049 2
key:1868412 2
key:30196093 3
key:2244564 3
key:24 4
key:1826479 5
key:564 3
key:109645 1
key:133049 2
key:1868412 2
key:30196093 3
key:2244564 3
key:24 4
key:1826479 5
key:564 3
key:109645 1
key:133049 2
key:1868412 2
key:30196093 3

I would like to know what data you gave to the Set command, and in the specific order.

Using the data you provided above, here's my attempt. This test passed on my side.

func TestIssue3(t *testing.T) {
	pairs := []struct {
		k string
		v int
	}{{"key:564", 3}, {"key:109645", 1}, {"key:133049", 2},
		{"key:1868412", 2}, {"key:30196093", 3}, {"key:2244564", 3},
		{"key:24", 4}, {"key:1826479", 5}, {"key:564", 3}, {"key:109645", 1},
		{"key:133049", 2}, {"key:1868412", 2}, {"key:30196093", 3},
		{"key:2244564", 3}, {"key:24", 4}, {"key:1826479", 5}, {"key:564", 3},
		{"key:109645", 1}, {"key:133049", 2}, {"key:1868412", 2},
		{"key:30196093", 3}, {"key:2244564", 3}, {"key:24", 4},
		{"key:1826479", 5}, {"key:564", 3}, {"key:109645", 1},
		{"key:133049", 2}, {"key:1868412", 2}, {"key:30196093", 3},
		{"key:2244564", 3}, {"key:24", 4}, {"key:1826479", 5}, {"key:564", 3},
		{"key:109645", 1}, {"key:133049", 2}, {"key:1868412", 2},
		{"key:30196093", 3}, {"key:2244564", 3}, {"key:24", 4},
		{"key:1826479", 5}, {"key:564", 3}, {"key:109645", 1},
		{"key:133049", 2}, {"key:1868412", 2}, {"key:30196093", 3},
		{"key:2244564", 3}, {"key:24", 4}, {"key:1826479", 5}, {"key:564", 3},
		{"key:109645", 1}, {"key:133049", 2}, {"key:1868412", 2},
		{"key:30196093", 3},
	}
	var m Map[string, int]
	for _, p := range pairs {
		m.Set(p.k, p.v)
	}
	for _, p := range pairs {
		v, _ := m.Get(p.k)
		if v != p.v {
			t.Fatal()
		}
	}
}

this test case hang on my computer:

func TestHashmapLoop(t *testing.T) {
	m := hashmap.New[string, string](50)
	seq := "GET/key:808943|GET/key:808943|SET/key:808943|GET/key:5834|GET/key:5834|SET/key:5834|GET/key:51630|GET/key:51630|SET/key:51630|GET/key:49504|GET/key:49504|SET/key:49504|GET/key:346528|GET/key:346528|SET/key:346528|GET/key:189743|GET/key:189743|SET/key:189743|GET/key:4112608|GET/key:4112608|SET/key:4112608|GET/key:21749|GET/key:21749|SET/key:21749|GET/key:844131|GET/key:844131|SET/key:844131|GET/key:827464|DELETE/key:844131|GET/key:827464"
	for _, op := range strings.Split(seq, "|") {
		tp := strings.Split(op, "/")
		switch tp[0] {
		case "GET":
			m.Get(tp[1])
		case "SET":
			m.Set(tp[1], tp[1])
		case "DELETE":
			m.Delete(tp[1])
		}
	}

}

Thanks! I just pushed a change that I believe fixes the issue.