openacid / slim

Surprisingly space efficient trie in Golang(11 bits/key; 100 ns/get).

Home Page:https://openacid.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Range-scan is supported!(under development)

greyireland opened this issue · comments

Range-scan is supported!(under development)
When to release this feature?
This feature will be very useful!

It would be in one or two months I think. :)

The functionality is ready actually. Before publishing it some refactoring on it must be done first. I can not wait any longer either. 😆

Any progress here? We need some way of iterating through the data structure

In my developing branch, there is a impl of Scan() in 9520481 .

The API is simple and raw bytes oriented:

// NextRaw returns next key-value pair in []byte.
type NextRaw func() ([]byte, []byte)

// Scan from a specified key. It returns a function `next()` that yields the next key and value every time called.
// The `next()` returns nil after all keys yield.
// The key and value it yields is a temporary slice []byte, i.e., next time calling the `next()`, the previously returned slice will be invalid.
//
// Since 0.5.11
func (st *SlimTrie) Scan(key string, withValue bool) NextRaw

Usage:

nxt := st.Scan("foo", true)

for i := int32(idx); i < int32(1000); i++ {
    key := keys[i]
    gotKey, gotVal := nxt()
    //ta.Equal([]byte(key), gotKey, "scan from: %s %v, idx: %d", sk, []byte(sk), idx)
}

Scan() works well. But I am not quite sure about whether the API is comfortable for end-users.
Would you tell me the way you want to use it? This would help me to stabilize the API and make it published.

Thanks for the response @drmingdrmer. That looks like something I can try working with.

For the API, I've been looking at some other go data structure libraries as an alternative to slim because of the iteration need, and most libraries seem to have an API that takes a function that is applied to each matching key:

I kind of like that. What do you think of providing that?

GitHub
Contribute to google/btree development by creating an account on GitHub.
GitHub
Golang implementation of Radix trees. Contribute to armon/go-radix development by creating an account on GitHub.

It might also be nice to provide a Scan(start, end) api which bounds the scan to a start prefix and end prefix. I want to use slim to store a list of file paths, and the operations I do are:

  1. Scan the entire data structure from start to end (i.e. scan through the flat list of files)
  2. Directory scan, for this I have a directory e.g. foo/bar/ and I want to scan all files with the prefix foo/bar/. With btree I can call AscendRange(start="foo/bar", end="foo/bas", func) where "foo/bas" is like foo/bar + 1 in terms of the prefix. This gives me all files prefixed with foo/bar

By the way, I also wanted to say I'm really impressed with slim. For around 9M file paths at 1.5GB on disk, slim compresses to 350MB while btree and radix use 2.5-3GB. Really strong work 👏

@goldsborough : it seems what you need is quite clear😄:

  • Being able to specify open/close boundary for the start and for the end,
  • and provide a receiver function dealing with the items.

I'll send you a pull-request about the changes later soon. 😆

In #142 we added 3 API to support range scan: ScanFrom(), ScanFromTo() and NewIter().
Issue closed!

If you have any other need, let me know! 😆