Seek iterator with Reverse doesn't work
brk0v opened this issue · comments
Code:
package main
import (
"fmt"
"io/ioutil"
"log"
"os"
"github.com/dgraph-io/badger"
)
func main() {
dir, err := ioutil.TempDir("", "badger")
if err != nil {
log.Fatal(err)
}
defer os.RemoveAll(dir)
opts := badger.DefaultOptions
opts.Dir = dir
opts.ValueDir = dir
db, err := badger.Open(opts)
if err != nil {
log.Fatal(err)
}
defer db.Close()
bkey := func(i int) []byte {
return []byte(fmt.Sprintf("%09d", i))
}
bval := func(i int) []byte {
return []byte(fmt.Sprintf("%025d", i))
}
txn := db.NewTransaction(true)
// Fill in 1000 items
n := 1000
for i := 0; i < n; i++ {
err := txn.Set(bkey(i), bval(i))
if err != nil {
log.Fatal(err)
}
}
err = txn.Commit(nil)
if err != nil {
log.Fatal(err)
}
opt := badger.DefaultIteratorOptions
opt.PrefetchSize = 10
opt.Reverse = true
// Iterate over 1000 items
var count int
err = db.View(func(txn *badger.Txn) error {
it := txn.NewIterator(opt)
for it.Seek([]byte("0")); it.ValidForPrefix([]byte("0")); it.Next() {
count++
}
return nil
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Counted %d elements", count)
}
This code returns 0 elements in reverse mode and 1000 elements with reverse = false
@brk0v That's the expected behaviour. Seek searches for a key greater than or equal to the specifed key. Say you have the following keys "00" ,"01", "02". When you seek for "01" in reverse mode the iteration would start at "00".
Similarly in your case since you are seeking for "0" it will go past all the values and you won't find anything.
Ok. Got it.
My case is to emulate some kind of buckets inside badger db. I was adding a prefix to keys. This prefix is a bucket. But with current iterator logic I can't find the last item in my "bucket".
You still can. You just need to choose the seek key wisely. Following from @janardhan1993's example, if you have [00, 01, 02], and you want to start from 01, then you can use 01~
(with tilde character). That way, the seek would stop just before 01, and it the first item would be 01.
He wanted to get the last item in the bucket, not the first, @manishrjain - I think, you misunderstood his statement.
Getting the reverse latest data from badgerdb is still Pain IA.
It's all about the choice of seek key. If you want the first data from reverse, you can just call Rewind after setting Reverse option.
There is no way to seek
reverse and rewind based on prefix.
You can set reverse and then Seek to prefix + 0xFF byte. That'd bring you to prefix.
@manishrjain I realize this comment is going on 3 years old, but it's subtly incorrect, and reverse iterating over a prefix with badger
is still very difficult to get right.
The method suggested in this issue: to find the last key with prefix [0x1, 0x2]
, set reverse=true and then Seek to [0x1, 0x2, 0xFF]
But this doesn't quite solve the problem when data exists with both the prefix and the 0xFF byte, as in the following example DB:
0x1 0x2 0x3
0x1 0x2 0x4
0x1 0x2 0xFF <----- reverse-Seek([0x1, 0x2, 0xFF])
0x1 0x2 0xFF 0x1
0x1 0x2 0xFF 0x2
0x1 0x3
0x1 0x3 0x1
I believe the solution that accounts for all cases is much more involved.
- You need to compute the first key lexicographically greater than
[prefix, 0xFF, 0xFF, ..., 0xFF]
, call thisincrement(prefix)
a.increment([0x1, 0x2]) = [0x1, 0x3]
b.increment([0x1, 0xFF, 0xFF]) = [0x2]
c.increment([0xFF, 0xFF, 0xFF]) = nil
, will need to callit.Rewind()
instead
i. ^it.Rewind()
does NOT work with a prefix ANDreverse=True
, will need to append one more0xFF
byte than the key length
ii. So one special case when all bytes are0xFF
:increment([0xFF, 0xFF, 0xFF])
=[0xFF, 0xFF, 0xFF, 0xFF]
- Then reverse-Seek on
increment(prefix)
- Finally, adjust by calling
it.Next()
if you arrive at a valid key that doesn't match the prefix. In this case, you have overshot the last key with the prefix by exactly one.
DB without exact match on incremented key:
0x1 0x2 0x3
0x1 0x2 0x4
0x1 0x2 0xFF
0x1 0x2 0xFF 0x1
0x1 0x2 0xFF 0x2 <--- reverse-Seek([0x1, 0x3]), it.Valid() == true, arrive at the correct final key for the prefix [0x1, 0x2]
0x1 0x3 0x1
DB with exact match on incremented key:
0x1 0x2 0x3
0x1 0x2 0x4
0x1 0x2 0xFF
0x1 0x2 0xFF 0x1
0x1 0x2 0xFF 0x2 <--- after it.Next(), arrive at the correct final key for the prefix [0x1, 0x2]
0x1 0x3 <----- reverse-Seek([0x1, 0x3]), it.Valid() == false, it.Item() != nil
0x1 0x3 0x1
@AlexMackowiak I don't think what you have will work correctly for case 1c. I've tried ... it doesn't play ball. If your prefix is 0xFF
and you want to iterate in reverse, without prior knowledge of the key length, I think it's impossible. It's almost like there's no choice but to reserve part of your key space for "impossible" values so that you can seek to them.
Therefore, IMO, the API / semantics for reverse iteration is quite bad, and needs overhauling.
@msackman Yeah you're actually right, I stumbled across this myself in some tests about a week later and I guess I forgot to follow up here. it.Rewind()
does not work with a prefix and reverse=true
how I expected it should, and I even filed an issue on the forum: https://discuss.dgraph.io/t/iterator-rewind-invalid-with-reverse-true-and-prefix-option-set/15518 but as far as I can tell this is the expected behavior?
I believe the actual case 1c that I ended up writing in my code was just to append as many 0xFF
values as (key length +1), and just hope that case doesn't come up much if ever in practice (which is a reasonable assumption for my application).
@AlexMackowiak Yep. I think in conclusion it's actually impossible to seek to the last item in a database without prior knowledge of the key length. Personally, I'm switching back to BoltDB as that seems to have a much better designed API IMO.
Just wanted to post correct code for reverse iteration.
func incrementPrefix(prefix []byte) []byte {
result := make([]byte, len(prefix))
copy(result, prefix)
var len = len(prefix)
for len > 0 {
if result[len-1] == 0xFF {
len -= 1
} else {
result[len-1] += 1
break
}
}
return result[0:len]
}
// Seeks to the last key that is valid for the given prefix. For
// correct results iterator must be setup to iterate iterate in the
// reverse direction and must not be limited to a certain prefix
func SeekLast(it *badger.Iterator, prefix []byte) {
i := incrementPrefix(prefix);
it.Seek(i);
if (it.Valid() && bytes.Equal(i, it.Item().Key())) {
it.Next();
}
}
Example Usage
func example(db *badger.DB, prefix []byte) error {
return db.View(func(txn *badger.Txn) error {
opts := badger.DefaultIteratorOptions
opts.Reverse = true
var it = txn.NewIterator(opts)
defer it.Close()
for SeekLast(it, prefix); it.ValidForPrefix(prefix); it.Next() {
// Do stuff with item...
}
return nil
})
}
Test
func Test_seekForReverseIterate(t *testing.T) {
type args struct {
prefix []byte
}
tests := []struct {
name string
args args
want []byte
}{
{"Standard Case", args{[]byte{0x05, 0x45, 0x77}}, []byte{0x05, 0x45, 0x78}},
{"Some 0xFF", args{[]byte{0x05, 0xFF, 0xFF}}, []byte{0x06}},
{"All 0XFF", args{[]byte{0xFF, 0xFF, 0xFF}}, []byte{}},
{"Empty", args{[]byte{}}, []byte{}},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := incrementPrefix(tt.args.prefix); !reflect.DeepEqual(got, tt.want) {
t.Errorf("seekForReverseIterate() = %v, want %v", got, tt.want)
}
})
}
}
func getKeys(db *badger.DB, prefix []byte) ([][]byte) {
result := [][]byte{};
err := db.View(func(txn *badger.Txn) error {
opts := badger.DefaultIteratorOptions
opts.Reverse = true
var it = txn.NewIterator(opts)
defer it.Close()
for SeekLast(it, prefix); it.ValidForPrefix(prefix); it.Next() {
result = append(result, it.Item().KeyCopy(nil));
}
return nil
})
if (err != nil) {
log.Fatal(err);
}
return result;
}
func Test_SeekLast(t *testing.T) {
// construct test database in memory
db, err := badger.Open(badger.DefaultOptions("").WithInMemory(true));
if err != nil {
log.Fatal(err)
}
defer db.Close();
keys := [][]byte{
{0xFF, 0xFF, 0xFF, 0xFF, 0xFF},
{0xFF, 0x1, 0xFF},
{0xFF, 0x1},
{0xFF},
{0x2},
{0x1, 0xFF, 0xFF, 0xFF, 0xFF},
{0x1, 0x1, 0xFF},
{0x1, 0x1},
{0x1},
}
err = db.Update(func(txn *badger.Txn) error {
for _, key := range keys {
err := txn.Set(key, []byte{});
if (err != nil) {
log.Fatal(err.Error());
}
}
return nil;
});
if (err != nil) {
log.Fatal(err);
}
// Function to get all keys
empty := [][]byte{};
// Do tests
type args struct {
prefix []byte
}
tests := []struct {
name string
args args
want [][]byte
}{
{"Prefix [0x01]", args{[]byte{0x01}}, keys[5:]},
{"Prefix []", args{[]byte{}}, keys},
{"Prefix nil", args{nil}, keys},
{"Prefix [0x00]", args{[]byte{0x00}}, empty},
{"Prefix [0x01 0x01]", args{[]byte{0x01, 0x01}}, keys[6:8]},
{"Prefix [0x02]", args{[]byte{0x02}}, keys[4:5]},
{"Prefix [0xFF]", args{[]byte{0xFF}}, keys[:4]},
{"Prefix [0xFF 0xFF]", args{[]byte{0xFF, 0xFF}}, keys[:1]},
{"Prefix [0xFF 0xFF 0xFF 0xFF 0xFF 0xFF]", args{[]byte{0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF}}, empty},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := getKeys(db, tt.args.prefix); !reflect.DeepEqual(got, tt.want) {
t.Errorf("getKeys(%v) = %v, want %v", tt.args, got, tt.want)
}
})
}
}
FWIW, I believe Iterator.Seek
should behave like SeekLast
by default for reverse direction iterators. At least there is a simple work around using the above.
Edit - 3/20/2022
+Updated implementation and test coverage of SeekLast
to correctly handle the case where incrementPrefix
returns a key present in the store.
+Updated the comment on SeekLast
to indicate that it will not work for iterators setup to only iterate over a specific prefix.
@PeterMcKinnis I believe this solution is currently missing one case and loses some efficiency over optimal reverse prefix scanning, great idea though we should definitely post some code for a correct implementation!
-
When the result of
incrementPrefix()
is itself a key in your database, the resulting iterator will not be valid for the prefix.
a. This is the "DB with exact match on incremented key" case from my original post, and requires anit.Next()
call to fix. -
You did legitimately find a workaround to the correctness issue when the prefix is entirely 0xFF!
a. This workaround is to ditch the badger Prefix IteratorOption entirely and check the prefix match at each key yourself
i. (Addingopts.Prefix = prefix
toTest_SeekLast
causes two test failures)
b. There is one downside to this though, finding the first prefix match in BadgerDB is already decently slow
and not specifying the Prefix IteratorOption will slow it down even further
The code I ended up writing to account for everything in the reverse prefix case is the following:
options := badger.DefaultIteratorOptions
options.Reverse = true
options.Prefix = rangePrefixKey
it := tx.NewIterator(options)
...
incRangePrefix := incrementPrefix(rangePrefixKey)
if len(incRangePrefix) > 0 {
it.Seek(incRangePrefix) // Point to the first key <= the range prefix
if it.Item() != nil && bytes.Equal(it.Item().Key(), incRangePrefix) {
// Exact match on incremented range prefix, need to move back one key for the last key matching the prefix
it.Next()
}
} else {
// Entire range prefix was just [0xFF, 0xFF, 0xFF, ..., 0xFF], use the last key in the database
// There's currently a bug with it.Rewind() when in reverse and with a prefix, it does not correctly find the largest key
// https://discuss.dgraph.io/t/iterator-rewind-invalid-with-reverse-true-and-prefix-option-set/15518
it.Seek(bytes.Repeat([]byte{0xFF}, len(rangePrefixKey)+1))
}
EDIT: Even this is sometimes invalid for the case where the prefix is all 0xFF. Our goal here is to point the iterator to the last key in the database, workarounds to it.Rewind()
being broken are to either know the length of the longest key in the database or make a new iterator without the prefix
option set and then it.Rewind()
will correctly go to the last key in the database.
I updated SeekLast
above to address the bug you mentioned. Good catch!
I implemented your solution but, it also fails the case where Your implementation also works correctly except in noted edge case. It will also be faster.incrementPrefix
is a valid key.
I replaced your SeekLast
with my implementation as follows and I don't see any test failures pertaining to that increment case
func SeekLast(it *badger.Iterator, prefix []byte) {
incRangePrefix := incrementPrefix(prefix)
if len(incRangePrefix) > 0 {
it.Seek(incRangePrefix) // Point to the first key <= the range prefix
if it.Item() != nil && bytes.Equal(it.Item().Key(), incRangePrefix) {
// Exact match on incremented range prefix, need to move back one key for the last key matching the prefix
it.Next()
}
} else {
// Entire range prefix was just [0xFF, 0xFF, 0xFF, ..., 0xFF], use the last key in the database
// There's currently a bug with it.Rewind() when in reverse and with a prefix, it does not correctly find the largest key
// https://discuss.dgraph.io/t/iterator-rewind-invalid-with-reverse-true-and-prefix-option-set/15518
it.Seek(bytes.Repeat([]byte{0xFF}, len(rangePrefixKey)+1))
}
}
I also have extensive test cases on this particular case, although they were written as part of a much larger, proprietary system that I wouldn't be able to share here.
I did not however have test coverage of the following case!
- When the prefix is all 0xFF
- but another key exists which matches the 0xFF ... 0xFF prefix but is longer than the prefix
My code was incorrect in this case, and to be truly correct you would need to know the length of the longest key in the database, as a workaround to it.Rewind()
being unusable when reverse=true
and prefix
is set. Or alternatively make a new iterator without prefix
set, so it's going to get pretty gross either way. I'll update my code posted with a note about this.