building indexes on objects

Question

building indexes on objects

joshrtay opened this issue 11 years ago · comments

can you elaborate on how bytewise would help you with indexes? it's not immediately clear to me. i thought range queries were only performed on the keys in leveldb.

Dominic Tarr commented 11 years ago

NICE!

Dominic Tarr · Answer 1 · Thu Jul 25 2013 21:20:28 GMT+0800 (China Standard Time)

Here is an example where I have used bytewise to do just that.

https://github.com/dominictarr/level-search

Dean Landolt · Answer 2 · Thu Jul 25 2013 21:33:08 GMT+0800 (China Standard Time)

That's right -- range queries are only performed on keys, but there's a difference between the keyspace of the range of entities in your database (the things you want to index) and the keyspace of the database as a whole. Within this wider database keyspace you can keep all your entities and indexes on those entities. You have to keep them separated of course, and bytewise is really nice way to partition this keyspace so that the two things don't mix, allowing you to use the atomic nature of batch writes to keep your indexes in sync with the entities they're indexing.

How you actually go about building your indexes is up to you, but features of bytewise can be really handy for this too. The ability to reliably partition your keyspaces is critical for something like an attribute index, who's keys would typically be a combination of the attribute name and the value. You'll probably also want to keep the reference to the entity being indexed in the key. In fact, there may not be anything at all worth storing in the value of the index recrod -- the whole index might be stored as tuples in the keyspace. And this is fine -- because of the way bytewise orders arrays elementwise you can use leveldb as a tuple store, which is more general than a kv store.

In my opinion the biggest value bytewise offers for indexing is in how it sorts non-string values -- something leveldb punts on out of the box. If you're trying to index numbers you could either conjure up some crazy padding scheme -- or just use bytewise. I guarantee the former will be brittle and very likely wrong. The way bytewise encodes numeric values will sort any possible javascript number correctly, even Infinity and -Infinity (but not NaN, which we fail on since it's nonsensical to sort).

When indexing into object values (which I believe is the subject of this issue) you can again take advantage of the partitioning abilities -- and the fact that you can sort arbitrarily nested arrays -- to store key paths into object values in your indexes. It's up to you exactly how you want to do this but can you see how you could use this to support something like mongo's object indexing?

I'm going to close the issue since there's nothing actionable, but feel free to keep the conversation going. I'd be happy to go into more detail on this if you have more specific questions. I've got a fully fleshed out design in my head for a database that already does all of this, and I've been building out the necessary pieces (slowly).

Dean Landolt · Answer 3 · Thu Jul 25 2013 21:38:14 GMT+0800 (China Standard Time)

That's awesome, @dominictarr. I'll have to find some time to play around with this, but it looks really nice.

Dominic Tarr · Answer 4 · Fri Jul 26 2013 19:18:23 GMT+0800 (China Standard Time)

@deanlandolt check out @eugeneware's https://github.com/eugeneware/jsonquery
he's implemented the mongo query syntax as a stream filter,
by pulling the right indexes out of this, (TODO) could easily reimplement mongo on top of leveldb

Josh Taylor · Answer 5 · Sun Jul 28 2013 04:57:32 GMT+0800 (China Standard Time)

@deanlandolt I see. thanks for the thorough response.

keep us updated on the database. i'd love to help with it. any idea how the performance of a mongo clone built on top of leveldb would compare to mongodb?

Eugene Ware · Answer 6 · Tue Jul 30 2013 00:41:35 GMT+0800 (China Standard Time)

@deanlandolt I've just implemented the mongodb query syntax WITH index support on my new project at: https://github.com/eugeneware/jsonquery-engine

It's smart enough to take your mongodb queries, and then find any indexes that would speed up the queries.

It support regular property indexes and also supports @dominictarr's pairs() indexing strategy out of the box.

It's build on my new generic query engine and indexing system called https://github.com/eugeneware/level-queryengine where you can plug in your own query language, and your own indexing strategy and use it with levelup.

It's brand new, and likely full of problems, but might be what you're looking for.

Josh Taylor · Answer 7 · Tue Jul 30 2013 01:11:12 GMT+0800 (China Standard Time)

@eugeneware very cool. have you tried benchmarking any queries against mongodb?

Dean Landolt · Answer 8 · Tue Jul 30 2013 09:36:10 GMT+0800 (China Standard Time)

@eugeneware awesome, thanks for chiming in with these links...

This is really exciting work -- nicely aligned with what I'd been scheming up, and really well executed! I need to find some time to go through it a bit closer.