Multi-column indexes
baryluk opened this issue · comments
Hi,
I have many own indexing layer on top of badger, but found about badgerhold
few days ago, and I am considering migrating to using it. badgerhold
looks pretty nice.
The blocker issue is, how to do multi-column indexes, with proper iteration order on them.
I was thinking maybe something like this:
type Item struct {
ID int
Created time.Time
XKey string
Updated time.Time
}
func (i *Item) Type() string {
return "Item"
}
var itemIndex map[string]badgerhold.Index
func init() {
itemIndex = map[string]badgerhold.Index{
"XKey_Updated": {
IndexFunc: func(_ string, value interface{}) ([]byte, error) {
i := value.(*Item)
v1 := i.XKey
v2 := i.Updated
return append(badgerhold.DefaultEncode(v1), badgerhold.DefaultEncode(v2)...)
},
Unique: false,
},
}
}
func (i *Item) Indexes() map[string]badgerhold.Index {
return itemIndex
}
In fact I am not even sure badgerhold
supports inequeality test on indexed keys (i.e. to do range scans): https://github.com/timshannon/badgerhold/blob/master/query.go#L1012
I would like to this for example:
key := "foobar"
oldestAllowed := time.Now().Add(-1*time.Hour)
badgerhold.Where("XKey").Eq(key).And("Updated").Ge(oldestAllowed).Index("XKey_Updated")
(I would also like to be able to do Lt
/Le
, not just Ge
, and of course Ge+Lt/Le, to do range scans using an index.
Even if I make the index value ([]byte
) be properly sorted (i.e. make everything fixed size, and pad any values so it is correctly lexicographical order), I still do not think badgerhold
will use that.
Yeah, that's currently not supported. The index system is very simple in that the index is just array of byte / keys. If the underlying storage structure of the indexes were the same as the datastore itself, then we'd have a lot more options for range scans, and we'd have much better performance as well, because we could take advantage of page splits, and all of the other nice things the underlying datastore uses to keep reads and writes consistently fast as the data increases.
It's a similar issue to this one (timshannon/bolthold#106). The solution is a complete refactor of how indexes are handled, and I have yet to find a way to do it simply, without basically reinventing all of things that boltdb / badgerdb are already doing in their underlying datastores.
Until then, you could use a compound field to accomplish what you want, at the cost of duplicating some of your data, which is the same tradeoff you get from indexes, but with more manual work:
type Item struct {
ID int
Created time.Time
XKey string
Updated time.Time
XKey_Updated []byte `badgerhold:"index"` // could use string or whatever type you want
}
badgerhold.Where("XKey_Updated").Eq(key).And("Updated").Ge(oldestAllowed).Index("XKey_Updated")