facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.

Home Page:http://rocksdb.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: "Multi" prefix extractor support

zaidoon1 opened this issue · comments

say my key format is <account_id>:<user_id>:<some dynamic value>

today, we can create a prefix extractor/bloom on <account_id>:<user_id> to help with queries that start with some known <account_id>:<user_id>, HOWEVER, what we can't do today is ALSO setup a prefix extractor on <account_id> this way, I can use bloom filters on queries that happen to know the account id + user id combination as well as the queries that only happen to have an account id. Effectively, in db/sql terminology, this is like being able to create multiple indexes on the "columns" to optimize queries like: select * from blah where account_id = 123 & select * from blah where account_id = 345 and user_id = 678

As far as I know, today we can only have one prefix extractor/bloom per cf so we have the following workarounds which are not ideal:

  1. create another cf that duplicates the data, so that one cf has <account_id>:<user_id> prefix extractor and the other has <account_id> prefix extractor and depending on the query/what we already know, we will lookup the kv from the corresponding cf. The issue here is we need to use more disk space to store the duplicate data

  2. Given <account_id> is common between both prefix extractors (in this use case) and we always have this, we use this as the prefix extractor, however, we miss on the opportunity to optimize queries that also have <user_id>

looks like something similar was requested https://groups.google.com/g/rocksdb/c/bb6Db8Y3xwU

@ajkr What do you think about a feature like this? It seems like it's very useful/high impact, but i'm not sure the level of effort is?

Can @pdillinger's key segment filtering (

// A class for splitting a key into meaningful pieces, or "segments" for
// filtering purposes. Keys can also be put in "categories" to simplify
// some configuration and handling. To simplify satisfying some filtering
// requirements, the segments must encompass a complete key prefix (or the whole
// key) and segments cannot overlap.
//
// Once in production, the behavior associated with a particular Name()
// cannot change. Introduce a new Name() when introducing new behaviors.
// See also SstQueryFilterConfigsManager below.
//
// OTHER CURRENT LIMITATIONS (maybe relaxed in the future for segments only
// needing point query or WHERE filtering):
// * Assumes the (default) byte-wise comparator is used.
// * Assumes the category contiguousness property: that each category is
// contiguous in comparator order. In other words, any key between two keys of
// category c must also be in category c.
// * Assumes the (weak) segment ordering property (described below) always
// holds. (For byte-wise comparator, this is implied by the segment prefix
// property, also described below.)
// * Not yet compatible with user timestamp feature
//
// SEGMENT ORDERING PROPERTY: For maximum use in filters, especially for
// filtering key range queries, we must have a correspondence between
// the lexicographic ordering of key segments and the ordering of keys
// they are extracted from. In other words, if we took the segmented keys
// and ordered them primarily by (byte-wise) order on segment 0, then
// on segment 1, etc., then key order of the original keys would not be
// violated. This is the WEAK form of the property, where multiple keys
// might generate the same segments, but such keys must be contiguous in
// key order. (The STRONG form of the property is potentially more useful,
// but for bytewise comparator, it can be inferred from segments satisfying
// the weak property by assuming another segment that extends to the end of
// the key, which would be empty if the segments already extend to the end
// of the key.)
//
// The segment ordering property is hard to think about directly, but for
// bytewise comparator, it is implied by a simpler property to reason about:
// the segment prefix property (see below). (NOTE: an example way to satisfy
// the segment ordering property while breaking the segment prefix property
// is to have a segment delimited by any byte smaller than a certain value,
// and not include the delimiter with the segment leading up to the delimiter.
// For example, the space character is ordered before other printable
// characters, so breaking "foo bar" into "foo", " ", and "bar" would be
// legal, but not recommended.)
//
// SEGMENT PREFIX PROPERTY: If a key generates segments s0, ..., sn (possibly
// more beyond sn) and sn does not extend to the end of the key, then all keys
// starting with bytes s0+...+sn (concatenated) also generate the same segments
// (possibly more). For example, if a key has segment s0 which is less than the
// whole key and another key starts with the bytes of s0--or only has the bytes
// of s0--then the other key must have the same segment s0. In other words, any
// prefix of segments that might not extend to the end of the key must form an
// unambiguous prefix code. See
// https://en.wikipedia.org/wiki/Prefix_code In other other words, parsing
// a key into segments cannot use even a single byte of look-ahead. Upon
// processing each byte, the extractor decides whether to cut a segment that
// ends with that byte, but not one that ends before that byte. The only
// exception is that upon reaching the end of the key, the extractor can choose
// whether to make a segment that ends at the end of the key.
//
// Example types of key segments that can be freely mixed in any order:
// * Some fixed number of bytes or codewords.
// * Ends in a delimiter byte or codeword. (Not including the delimiter as
// part of the segment leading up to it would very likely violate the segment
// prefix property.)
// * Length-encoded sequence of bytes or codewords. The length could even
// come from a preceding segment.
// * Any/all remaining bytes to the end of the key, though this implies all
// subsequent segments will be empty.
// For each kind of segment, it should be determined before parsing the segment
// whether an incomplete/short parse will be treated as a segment extending to
// the end of the key or as an empty segment.
//
// For example, keys might consist of
// * Segment 0: Any sequence of bytes up to and including the first ':'
// character, or the whole key if no ':' is present.
// * Segment 1: The next four bytes, all or nothing (in case of short key).
// * Segment 2: An unsigned byte indicating the number of additional bytes in
// the segment, and then that many bytes (or less up to the end of the key).
// * Segment 3: Any/all remaining bytes in the key
//
// For an example of what can go wrong, consider using '4' as a delimiter
// but not including it with the segment leading up to it. Suppose we have
// these keys and corresponding first segments:
// "123456" -> "123"
// "124536" -> "12"
// "125436" -> "125"
// Notice how byte-wise comparator ordering of the segments does not follow
// the ordering of the keys. This means we cannot safely use a filter with
// a range of segment values for filtering key range queries.
//
// Also note that it is legal for all keys in a category (or many categories)
// to return an empty sequence of segments.
//
// To eliminate a confusing distinction between a segment that is empty vs.
// "not present" for a particular key, each key is logically assiciated with
// an infinite sequence of segments, including some infinite tail of 0-length
// segments. In practice, we only represent a finite sequence that (at least)
// covers the non-trivial segments.
//
, #12075) be used for this purpose?

oh interesting, I didn't know this exists, I'll take a closer look at how this works. Is this being used in production right now anywhere? Any gotchas?

So reading this:

To simplify satisfying some filtering requirements, the segments must encompass a complete key prefix (or the whole key) and segments cannot overlap.

Specifically, the segments cannot overlap part means this won't work for my use case (unless I'm misunderstanding). So to use the terminology being used here, given a key of the form <account_id>:<user_id>:<some dynamic value>, I would like to create two segments for filtering: <account_id> & <account_id>:<user_id> given that both segments share the <account_id> part, this means the two segments are "overlapping" and therefore are not allowed right now?

or.. actually maybe the whole point is to use the category concept? So I can have one category that contains two segments:

<account_id> & <user_id> and then I can do the filtering by "category" to satisfy queries like: select * from blah where account_id = 345 and user_id = 678 or I can do the filtering by "segment" (specifically the account_id segment) to satisfy queries like select * from blah where account_id = 123?

also per https://github.com/facebook/rocksdb/blob/v9.3.1/include/rocksdb/experimental.h#L334-L335 how does the filter that is being used here compare to bloom/ribbon perf wise, etc.. any benchmarks, etc..?

I think this is exactly what I need but would love more examples and I will likely wait until bloom/ribbon filters are supported

The API and functionality is not yet complete for the filtering you want, but the KeySegmentsExtractor API is intended to be complete.

Specifically, the segments cannot overlap part means this won't work for my use case (unless I'm misunderstanding)

You want a segment for each field in your key. This should be stable regardless of your desired filtering strategy (except when you extend or replace your key schema). You want a Bloom/ribbon filter on SelectKeySegment(0) and a Bloom/ribbon filter on SelectKeySegmentRange(0,1). Creating Bloom/ribbon filters is not yet available in the API:

https://github.com/facebook/rocksdb/blob/9.2.fb/include/rocksdb/experimental.h#L334-L335

got it, thanks for confirming! It's great that what I'm looking for is being worked on. Is there an existing issue that tracks the rest of this work that I can track or should I just keep this issue open?

@pdillinger sorry, just wanted to confirm,, I might have made an incorrect assumption. Based on what you said: The API and functionality is not yet complete for the filtering you want,, Is it safe to assume the filtering I want is planned to be done or is this not something that is planned/being prioritized?