For ordered sequences, have `changes()` return the new/old position

Question

For ordered sequences, have `changes()` return the new/old position

neumino opened this issue 9 years ago · comments

Somehow related to #3709

Typically if I want to build a leaderboard, I'll run something like this:

r.table('users').orderBy({index: 'score'}).limit(10).changes()

And then try to maintain an array of the best users.

The changes I'll get will be like this:

{
    new_val: {id: 14, score: 68},
    old_val: {id: 23, score: 10} 
}

The problem is that while I can remove the last element {id: 23, score: 10} from my array, I have no idea where I should insert the new document {id:14, score: 68} (without going through my array).
It would be nice to return the new/old position in changes() for ordered sequences. Augmenting the data like what ReQL does for getNearest seems like a good idea to me.

Chet Corcos commented 9 years ago

+1

Chet Corcos commented 8 years ago

🌿

Daniel Mewes · Answer 1 · Tue Feb 03 2015 13:23:09 GMT+0800 (China Standard Time)

Sounds like a good idea.

Michel · Answer 2 · Tue Feb 03 2015 13:25:19 GMT+0800 (China Standard Time)

One more little thing. Is it possible to return a new type ORDERED_FEED? For the same reason ATOM_FEED is useful.

In case of an ordered sequence, I would like to have thinky maintain an array and just emit events when the array is updated.

Michael Lucy · Answer 3 · Tue Feb 03 2015 16:43:44 GMT+0800 (China Standard Time)

I added a new issue for the ORDERED_FEED thing, since that's pretty easy. Returning the index is a little bit harder because of squashing. Consider the case where the row A at position 5 is deleted, then the row that is now at position 5 is updated from B to B'. Right now the queued up changes look like [{old_val: A, new_val: nil, old_position: 5}, {old_val: B, new_val: B', old_position: 5, new_position: 5}]. Now if a new row A' is inserted that takes position 5, we squash it onto the first change, and we need to produce [{old_val: A, new_val: A', old_position: 5, new_position: 5}, {old_val: B, new_val: B', old_position: 6, new_position: 6}]. I.e. the new change being squashed onto the first change requires us to update the positions in the second change. We could get around that by keeping a snapshot of the pre-batch ordered set around in memory and adding the position tags when the batch is sent, but that would double memory usage in the worst case and be a bit of implementation effort.

Timothy Maxwell · Answer 4 · Wed Feb 04 2015 08:04:15 GMT+0800 (China Standard Time)

For some situations (e.g. r.table("foo").orderBy(...).changes()) it's impractical to compute the old and new positions. So I think it should be an optarg to .changes(), and it should error if it's impractical to compute positions. Also, #3719 might make this unnecessary.

Michel · Answer 5 · Wed Feb 04 2015 13:54:40 GMT+0800 (China Standard Time)

For ordered feeds, we could maybe squash only the last change? I'm actually fine not squashing things in a first time.

@timmaxw -- why would be impractical to compute the old and new positions? Isn't that "just" about keeping a number of children on each node of the tree?
While it requires more updates on the tree, it would also make things like orderBy(...).skip(x).limit(y) more efficient no?

Daniel Mewes · Answer 6 · Thu Feb 05 2015 04:44:32 GMT+0800 (China Standard Time)

Since we don't currently support r.table("foo").orderBy(...).changes() without a limit(...), I don't think we have to worry too much about that now. In the future I think throwing an error when it's impractical to compute positions will work fine.

For small result sets #3719 might indeed be an alternative. One thing to note though is that it doesn't save any computations (not asymptotically at least) compared to iterating through the current results set on the client and finding out the positions of the old value and new value that way.
@neumino what do you think about that?

For making this work with squash, we could only keep the primary keys of entries around in the pre-batch ordered set to avoid too much memory overhead. Of course that wouldn't make it any easier to implement.

@neumino The idea of adding subtree sizes to the btree has been suggested a few times, mostly in the context of making count() on ranges efficient. However the problem is that we would need to update those counts, which would mean that whenever inserting or deleting a document into a leaf, we would have to rewrite O(log n) blocks up to the top of the btree. That's impractical. We could keep the counts in the LBA, which is more efficient for those sorts of things, but that comes with some memory overhead.

Michael Lucy · Answer 7 · Sat Feb 07 2015 07:40:49 GMT+0800 (China Standard Time)

@danielmewes -- I haven't really thought about this enough, and I would guess we probably won't find time to do this in 2.0; could we bump it to subsequent and settle it later?

Daniel Mewes · Answer 8 · Sat Feb 07 2015 07:45:12 GMT+0800 (China Standard Time)

@mlucy Yes definitely.

Slava Akhmechet · Answer 9 · Wed Apr 22 2015 07:10:11 GMT+0800 (China Standard Time)

FYI, losing order for feeds has come up literally every time I've explained .orderBy.limit.changes to a user. As far as prioritization of ReQL issues goes, I think this one is pretty high up there.

Daniel Mewes · Answer 10 · Wed Apr 22 2015 07:19:57 GMT+0800 (China Standard Time)

I think if we implement #3719 this one becomes less urgent. But we should make sure to do at least one of them for 2.1.

Michael Lucy · Answer 11 · Wed Apr 22 2015 12:07:39 GMT+0800 (China Standard Time)

I'm not opposed to this, but there's a lot scheduled for 2.1, so unless it's absolutely vital I'd prefer to just do #3719.

Michael Lucy · Answer 12 · Fri Apr 24 2015 11:47:58 GMT+0800 (China Standard Time)

@coffeemug, @danielmewes -- are we actually doing this for 2.1, or do you think we can get away with just doing #3719?

Daniel Mewes · Answer 13 · Fri Apr 24 2015 11:57:05 GMT+0800 (China Standard Time)

I think #3719 is good enough for 2.1.

We all seem to agree that this is useful, but we would have to discuss the names of the arguments and how to integrate the positions into the events.

Moving out of 2.1.

Slava Akhmechet · Answer 14 · Fri Apr 24 2015 13:33:39 GMT+0800 (China Standard Time)

I agree with @danielmewes -- I think #3719 is sufficient for 2.1.

Slava Kim · Answer 15 · Wed Jul 01 2015 05:13:15 GMT+0800 (China Standard Time)

commenting to register my interest publicly, as well as to subscribe to further updates

dalan · Answer 16 · Fri Sep 18 2015 04:10:03 GMT+0800 (China Standard Time)

I'm also going to comment here for a vote in support 👍

Daniel Mewes · Answer 17 · Fri Sep 18 2015 06:43:41 GMT+0800 (China Standard Time)

@dalanmiller also mentioned that it would be nice if we could support this for unlimited orderBy changefeeds (e.g. r.table(...).orderBy({index: ...}).changes().
It sounds like it wouldn't easily work with our current implementation of those, and I can't see an obvious way of doing it without a lot of memory overhead. But we should think about it a bit.

Slava Akhmechet · Answer 18 · Fri Sep 18 2015 06:46:56 GMT+0800 (China Standard Time)

I'd like to understand the use case for that better. I can't easily think of one, and I don't think we should worry about it unless a compelling use case actually exists.

Slava Kim · Answer 19 · Fri Sep 18 2015 07:45:37 GMT+0800 (China Standard Time)

Honestly, I don't think the positions are important to support in the change feeds. RethinkDB already maintains the ordered sorted set, just w/o positions. Positions can be recalculated efficiently in the db client.

Daniel Mewes · Answer 20 · Fri Sep 18 2015 08:13:21 GMT+0800 (China Standard Time)

I think supporting it for non-limited orderBy.changes would mostly improve the consistency of the query language and the available features.
If it's non-trivial to do, I don't think we should do it. But if it is, we should do it.

Chet Corcos · Answer 21 · Fri Sep 18 2015 09:51:19 GMT+0800 (China Standard Time)

with changefeeds for ordered queries and joins, I could make a really lean integration with Meteor. 👍

Slava Akhmechet · Answer 22 · Fri Sep 18 2015 10:03:37 GMT+0800 (China Standard Time)

@ccorcos -- just FYI, feeds on ordered queries already work. You can already do things like this: r.table('foo').orderBy({index: 'blah'}).limit(10).changes().

Feeds on joins are coming soon (tm).

Chet Corcos · Answer 23 · Sat Sep 19 2015 01:41:05 GMT+0800 (China Standard Time)

but those just give changes to the individual document feeds right? are there any notifications indicating that the order has changed?

Slava Akhmechet · Answer 24 · Sat Sep 19 2015 04:05:11 GMT+0800 (China Standard Time)

@ccorcos -- no, they also include notifications that the order has changed. For example, if two documents flip their order, they'll show up in the feed.

Slava Akhmechet · Answer 25 · Sat Sep 19 2015 04:07:27 GMT+0800 (China Standard Time)

Sorry, to clarify -- currently we don't give information about the actual order because the limited number of documents are easy to sort on the client; but we do give all the information you need to do it.

Daniel Mewes · Answer 26 · Sat Apr 30 2016 06:17:34 GMT+0800 (China Standard Time)

This shipped in 2.3.0 (see #5334).

Geoffrey Plitt · Answer 27 · Thu Aug 11 2016 09:29:31 GMT+0800 (China Standard Time)

Is there a standard way, on the consumer side, to convert this feed of old_pos, new_pos into the final array a user would want to display?

I guess another way of asking this is the following question: For r.table().order_by().limit(10).changes(), I'd like to use a callback that takes the final updated array, not individual (each()) elements.

I'm just asking because I imagine lots of people are doing this, and maybe there's a function or a separate NPM module that makes it easy to coalesce into a result array.

Daniel Compton · Answer 28 · Thu Aug 11 2016 10:19:36 GMT+0800 (China Standard Time)

@GeoffreyPlitt sounds a bit like this is what you're after? rethinkdb/docs#1137 (comment)

Daniel Mewes · Answer 29 · Fri Aug 12 2016 04:25:12 GMT+0800 (China Standard Time)

The pseudo-code in rethinkdb/docs#1137 (comment) works for set-semantics, where the ordering doesn't matter. For an ordered result with old_offset and new_offset fields, the algorithm will look somewhat like this:

// A data structure with (resizable) array semantics
var Array documents;

feed = r.table(...).orderBy(...).limit(...).changes({includeInitial: true, includeStates: true, includeOffsets: true}).run();

for change in feed {
  if (change has field "state" and change.state == "ready") {
    // We have reached a consistent state of the documents in the table.
    // This might be a good time for rendering the results etc.
    // We can simply stay in the `for` loop to keep it updated.
  }
  if (change has field "old_offset" and change.old_offset != null) {
    // Remove the old value at index old_offset from the array
    // (`erase` should "cut out" the value, reducing the index of every
    //  subsequent element by 1)
    documents.erase(change.old_offset);
  }
  if (change has field "new_offset" and change.new_offset != null) { 
    // Add the new value to the array at position new_offset
    documents.insert(change.new_offset, change.new_val);
  }
}