pubkey / event-reduce

An algorithm to optimize database queries that run multiple times https://pubkey.github.io/event-reduce/

Home Page:https://pubkey.github.io/event-reduce

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intuitive explanation of how the library works

Venryx opened this issue · comments

I can understand the use-case for this library, but I don't understand (even marginally) the "how" of it.

Let's take the example query in the readme:

const exampleQuery: MongoQuery = {
    selector: {
        age: {
            $gt: 18
        },
        gender: 'm'
    },
    limit: 10,
    sort: ['name', '_id']
};

And let's say these are the initial database contents: (as JSON)

[
    {gender: "m", age: 10, name: "Bob"},
    {gender: "m", age: 20, name: "Dan"},
    {gender: "f", age: 10, name: "Alice"},
    {gender: "f", age: 20, name: "Sally"},
]

So then, the initial result set for the query would be this:

[
    {gender: "m", age: 20, name: "Dan"},
]

However, let's say a new user was just added:

{gender: "m", age: 30, name: "Mike"},

The new entry is, of course, a match for the query. And since the name "Mike" is later than "Dan", it should be added to the end of the query result set (rather than the start).

My specific question for this case then is: How on earth is your generic event-reduce algorithm able to know that the new "Michael" entry should be added after the "Dan" entry rather than before it?

This is of course easy to solve if you're hand-coding the query-result update system (eg. you do a binary search through the old results, to find the insert point for the new entry). But I currently have no idea how your library is able to accomplish this generically, based just on the MongoQuery params, the old results, and a change-stream entry. (it seems like it would have to recreate the entire MongoDB query execution code within the library -- yet the library's described as not being specific to MongoDB)

As explaining the entire algorithm would be too much to ask, I give the specific case above, in the hopes that there is an intuitive answer that can be given for this small case, that can give insight into how the library solves these cases in general.

As shown here you have to pass a sortComparator which is able to take 2 documents and sort them in the correct order where correct is defined by the query.
The algorithm uses that function to determine if Mike is sorted before Dan.

Mostly the sortComparator already exists somewhere and you can just import it. Like I have done in the minimongo example.

Ah okay, so sorting is handled by running a sort operation after the new entry is added to the result set.

Looking into the code a bit, I see that it also makes use of the minimongo library for some query matching work. That helps explain a lot! As I was confused how just generating a mapping of 2^17 combinations could somehow capture the "logic" of query matching, sorting, etc.

That generated mapping must be accomplishing something else then -- something that cannot be (realistically) accomplished just with regular if-then logic.

I will not waste your time for asking for an explanation at this point (since I don't need it right now, and I haven't yet "done my due diligence" regarding reading the documentation and code), but it does make me curious. (particularly that even with all the permutations mapped, it reaches a 94% solve-rate, rather than 100% as I would have expected)