Multi-column indexes

Question

Multi-column indexes

baryluk opened this issue 2 years ago · comments

Hi,

I have many own indexing layer on top of badger, but found about badgerhold few days ago, and I am considering migrating to using it. badgerhold looks pretty nice.

The blocker issue is, how to do multi-column indexes, with proper iteration order on them.

I was thinking maybe something like this:

type Item struct {
	ID       int
	Created  time.Time
	XKey string
	Updated  time.Time
}

func (i *Item) Type() string {
	return "Item"
}

var itemIndex map[string]badgerhold.Index
func init() {
	itemIndex = map[string]badgerhold.Index{
		"XKey_Updated": {
			IndexFunc: func(_ string, value interface{}) ([]byte, error) {
				i := value.(*Item)
				v1 := i.XKey
				v2 := i.Updated
				return append(badgerhold.DefaultEncode(v1), badgerhold.DefaultEncode(v2)...)
			},
			Unique: false,
		},
	}
}
func (i *Item) Indexes() map[string]badgerhold.Index {
	return itemIndex
}

In fact I am not even sure badgerhold supports inequeality test on indexed keys (i.e. to do range scans): https://github.com/timshannon/badgerhold/blob/master/query.go#L1012

I would like to this for example:

key := "foobar"
oldestAllowed := time.Now().Add(-1*time.Hour)
badgerhold.Where("XKey").Eq(key).And("Updated").Ge(oldestAllowed).Index("XKey_Updated")

(I would also like to be able to do Lt/Le, not just Ge, and of course Ge+Lt/Le, to do range scans using an index.

Even if I make the index value ([]byte) be properly sorted (i.e. make everything fixed size, and pad any values so it is correctly lexicographical order), I still do not think badgerhold will use that.

Tim Shannon · Answer 1 · Sat May 21 2022 23:35:45 GMT+0800 (China Standard Time)

Yeah, that's currently not supported. The index system is very simple in that the index is just array of byte / keys. If the underlying storage structure of the indexes were the same as the datastore itself, then we'd have a lot more options for range scans, and we'd have much better performance as well, because we could take advantage of page splits, and all of the other nice things the underlying datastore uses to keep reads and writes consistently fast as the data increases.

It's a similar issue to this one (timshannon/bolthold#106). The solution is a complete refactor of how indexes are handled, and I have yet to find a way to do it simply, without basically reinventing all of things that boltdb / badgerdb are already doing in their underlying datastores.

Until then, you could use a compound field to accomplish what you want, at the cost of duplicating some of your data, which is the same tradeoff you get from indexes, but with more manual work:

type Item struct {
	ID       int
	Created  time.Time
	XKey string
	Updated  time.Time
        XKey_Updated []byte `badgerhold:"index"` // could use string or whatever type you want
}


badgerhold.Where("XKey_Updated").Eq(key).And("Updated").Ge(oldestAllowed).Index("XKey_Updated")