Range Scan?

Question

Range Scan?

prologic opened this issue 5 years ago · comments

is there support for doing range scans?

For example:

start := []byte("foo_10")
end := []byte("foo_20")
trie.ForEachRange(start, end, func(node art.Node) bool {
    // do something with node
    return true
})

I think only prefix iteration is support currently? Would it be hard to add range iteration?

Pavel Larkin · Answer 1 · Sat Sep 21 2019 13:02:12 GMT+0800 (China Standard Time)

It is unclear for me from your example what you mean under "range". Could you define it?

You can try to use ForEachPrefix:

key := []byte("foo_")
trie.ForEachPrefix(key, func(node art.Node) bool {
    // validate here if node in your range...
    return true
})

James Mills · Answer 2 · Sat Sep 21 2019 15:27:51 GMT+0800 (China Standard Time)

Are you saying I can implement a range scan using the ForEachPrefix and testing that the node.Key() falls within my start and end?

Pavel Larkin · Answer 3 · Sat Sep 21 2019 17:04:15 GMT+0800 (China Standard Time)

Yes. How are you going to check if a node key falls in start, end range?

…

On Sat, Sep 21, 2019, 12:27 AM James Mills ***@***.***> wrote: Are you saying I can implement a range scan using the ForEachPrefix and testing that the node.Key() falls within my start and end? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AAARYI36NVM66T7LRRPYF4TQKXEHPA5CNFSM4IY4LRG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IMMJQ#issuecomment-533775910>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAARYIY6A55B5LAQQZHOOELQKXEHPANCNFSM4IY4LRGQ> .

James Mills · Answer 4 · Sat Sep 21 2019 19:56:23 GMT+0800 (China Standard Time)

I think its better if I just put up code that shows what I'm trying to accomplish.

See: https://github.com/prologic/bitcask/pull/101

Basically we are ranging over keys between a start and end key.

James Mills · Answer 5 · Sat Sep 21 2019 20:02:28 GMT+0800 (China Standard Time)

Example (using the bitcask shell):

$ git clone https://github.com/prologic/bitcask
Cloning into 'bitcask'...
remote: Enumerating objects: 69, done.
remote: Counting objects: 100% (69/69), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 790 (delta 26), reused 44 (delta 12), pack-reused 721
Receiving objects: 100% (790/790), 255.30 KiB | 453.00 KiB/s, done.
Resolving deltas: 100% (434/434), done.
$ cd bitcask
$ git checkout range_scan
Branch 'range_scan' set up to track remote branch 'range_scan' from 'origin'.
Switched to a new branch 'range_scan'
$ make
bitcask version v0.3.4@d03eb48
bitcaskd version v0.3.4@d03eb48
$ for i in $(seq 1 9); do ./bitcask -p ./tmp.db set foo_$i $i; done
$ ./bitcask -p ./tmp.db range foo_3 foo_7
3
4
5
6
7

Hope this helps makes things more clear

Pavel Larkin · Answer 6 · Sun Sep 22 2019 17:49:39 GMT+0800 (China Standard Time)

I see, you are using lexicographic comparison. See my idea below.

// tests here: https://play.golang.org/p/MsUL00KZqww
func longestCommonPrefix(start, end []byte) []byte {
	var min, max []byte
	var cmp = bytes.Compare(start, end) == -1
	if cmp {
		min, max = start, end
	} else {
		min, max = end, start
	}

	if len(min) == 0 {
		return nil
	}

	var i int
	for i = 0; i < len(min) && i < len(max) && min[i] == max[i]; i++ {
	}
	if i == 0 {
		return nil
	}

	return min[:i]
}

func executor(key, start, end []byte, f func(key []byte) error) (bool, error) {
	if bytes.Compare(key, start) < 0 || bytes.Compare(key, end) > 0 {
		return false, nil // contnue iteration
	}

	if err := f(key); err != nil {
		return false, err // stop iteration
	}

	return true, nil // continue iteration
}

// Range performs a range scan of keys matching a range of keys between the
// start key and end key and calling the function `f` with the keys found.
// If the function returns an error no further keys are processed and the
// first error returned.
func (b *Bitcask) Range(start, end []byte, f func(key []byte) error) (err error) {
	lcp := longestCommonPrefix(start, end)
	if lcp != nil {
                // we have the longest common prefix, lets use `ForEachPrefix`
                // to minimize number of iterations
		b.trie.ForEachPrefix(lcp, func(node art.Node) (ok bool) {
			ok, err = executor(node.Key(), start, end, f)
			return (!ok && err == nil) || ok
		})
	} else {
                // iterate over all nodes and check if node.key in the range
	        b.trie.ForEach(func(node art.Node) (ok bool) {
			ok, err = executor(node.Key(), start, end, f)
			return (!ok && err == nil) || ok
		})
	}
}

The above code was not tested. :)

James Mills · Answer 7 · Sun Sep 22 2019 18:16:18 GMT+0800 (China Standard Time)

Ahh I see. Compute the longest prefix of start and end and then use this as the starting point of the ForEachPRefix()? Makes sense! Is this something we can build into your library :D

James Mills · Answer 8 · Sun Sep 22 2019 18:17:22 GMT+0800 (China Standard Time)

And yes I guess I meant Range Scan with lexicographic comparison as the keys are byte slices so I'm not sure any other kind of range scan makes sense? :)

Pavel Larkin · Answer 9 · Sat Sep 28 2019 15:13:34 GMT+0800 (China Standard Time)

Kleene–Brouwer order? :)