deanlandolt / bytewise

Binary serialization of arbitrarily complex structures that sort bytewise

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

building indexes on objects

joshrtay opened this issue · comments

can you elaborate on how bytewise would help you with indexes? it's not immediately clear to me. i thought range queries were only performed on the keys in leveldb.

Here is an example where I have used bytewise to do just that.

https://github.com/dominictarr/level-search

That's right -- range queries are only performed on keys, but there's a difference between the keyspace of the range of entities in your database (the things you want to index) and the keyspace of the database as a whole. Within this wider database keyspace you can keep all your entities and indexes on those entities. You have to keep them separated of course, and bytewise is really nice way to partition this keyspace so that the two things don't mix, allowing you to use the atomic nature of batch writes to keep your indexes in sync with the entities they're indexing.

How you actually go about building your indexes is up to you, but features of bytewise can be really handy for this too. The ability to reliably partition your keyspaces is critical for something like an attribute index, who's keys would typically be a combination of the attribute name and the value. You'll probably also want to keep the reference to the entity being indexed in the key. In fact, there may not be anything at all worth storing in the value of the index recrod -- the whole index might be stored as tuples in the keyspace. And this is fine -- because of the way bytewise orders arrays elementwise you can use leveldb as a tuple store, which is more general than a kv store.

In my opinion the biggest value bytewise offers for indexing is in how it sorts non-string values -- something leveldb punts on out of the box. If you're trying to index numbers you could either conjure up some crazy padding scheme -- or just use bytewise. I guarantee the former will be brittle and very likely wrong. The way bytewise encodes numeric values will sort any possible javascript number correctly, even Infinity and -Infinity (but not NaN, which we fail on since it's nonsensical to sort).

When indexing into object values (which I believe is the subject of this issue) you can again take advantage of the partitioning abilities -- and the fact that you can sort arbitrarily nested arrays -- to store key paths into object values in your indexes. It's up to you exactly how you want to do this but can you see how you could use this to support something like mongo's object indexing?

I'm going to close the issue since there's nothing actionable, but feel free to keep the conversation going. I'd be happy to go into more detail on this if you have more specific questions. I've got a fully fleshed out design in my head for a database that already does all of this, and I've been building out the necessary pieces (slowly).

That's awesome, @dominictarr. I'll have to find some time to play around with this, but it looks really nice.

@deanlandolt check out @eugeneware's https://github.com/eugeneware/jsonquery
he's implemented the mongo query syntax as a stream filter,
by pulling the right indexes out of this, (TODO) could easily reimplement mongo on top of leveldb

@deanlandolt I see. thanks for the thorough response.

keep us updated on the database. i'd love to help with it. any idea how the performance of a mongo clone built on top of leveldb would compare to mongodb?

@deanlandolt I've just implemented the mongodb query syntax WITH index support on my new project at: https://github.com/eugeneware/jsonquery-engine

It's smart enough to take your mongodb queries, and then find any indexes that would speed up the queries.

It support regular property indexes and also supports @dominictarr's pairs() indexing strategy out of the box.

It's build on my new generic query engine and indexing system called https://github.com/eugeneware/level-queryengine where you can plug in your own query language, and your own indexing strategy and use it with levelup.

It's brand new, and likely full of problems, but might be what you're looking for.

NICE!

@eugeneware very cool. have you tried benchmarking any queries against mongodb?

@eugeneware awesome, thanks for chiming in with these links...

This is really exciting work -- nicely aligned with what I'd been scheming up, and really well executed! I need to find some time to go through it a bit closer.