Tweak reads to not require a recordId, and just return the latest created record matching a query filter

Question

Tweak reads to not require a recordId, and just return the latest created record matching a query filter

csuwildcat opened this issue a year ago · comments

We could tweak Reads to not require a recordId, such that if someone passed a protocol/protocolPath it would just return the latest record to be added for that indicated filter.

Diane Huxley · Answer 1 · Fri Aug 11 2023 04:09:35 GMT+0800 (China Standard Time)

What's the use case for this? This is not an intuitive behavior for a Read to me. This feels like a specific query.

Maybe we're looking for a limit option on RecordsQuery?

Daniel Buchner · Answer 2 · Fri Aug 11 2023 08:34:59 GMT+0800 (China Standard Time)

The use case is so that if I have a record type avatar under a Profile protocol, I can do a Read of Profile Protocol + the avatar protocolPath and know I am getting the bytes for the singular latest record under that bucket.

Diane Huxley · Answer 3 · Sat Aug 12 2023 01:26:02 GMT+0800 (China Standard Time)

Does having limit: 1 or possibly latest: true on queries solve this? If this could be solved with a small addition to RecordsQuery, I don't think we should add it to RecordsRead. My further worry is that it will set a precedent for other query-like parameters on read, muddying the difference between read and query.

@LiranCohen's notes from #470 show that this is already the intention.

There is a potential to add additional fields to the description such as author, recipient, published etc for further filtering here.

I will add contextId as an option when doing a Read based on protocol + protocolPath

Frank Hinek · Answer 4 · Thu Aug 17 2023 03:45:31 GMT+0800 (China Standard Time)

+1 to @diehuxx 's comments

Modifying RecordsRead in this way seems like an attempt to work around a limitation in RecordsQuery that ought to be addressed.

This use case also causes me to wonder: If we're adding query parameters to RecordsQuery, should we consider a parameter that returns the data payload regardless of the size if limit: 1?

Henry Tsai · Answer 5 · Thu Aug 17 2023 07:58:42 GMT+0800 (China Standard Time)

I shared similar sentiment as @frankhinek and @diehuxx, this seems like programming sugar to me, it also feels like we are attempting to workaround the design decision of generated recordId to mimic the benefit of predefined record ID.

Liran Cohen · Answer 6 · Thu Aug 17 2023 20:10:53 GMT+0800 (China Standard Time)

We had some discussion around this during office hours and mentioned that RecordsRead is almost analogous to an HTTP GET req, which fits my mental model as well.

I think the main limitation of using RecordsQuery is not necessarily limiting it to a single record, but rather actually reading the data that comes with that record as @frankhinek pointed out.

Maybe it would be worthwhile to discuss what type of improvements we can make to DataStore in order to support something like this, as we would need to allow streaming of large data with RecordsQuery. If that ends up being the case, it almost seems to eliminate any need for RecordsRead altogether.

I did add the ability for only a parentId to be passed as an additional parameter for the RecordsRead, as I thought it would be useful to read the latest record in a path you know the parent of, ie. game/score where you know the specific game you are looking for.

But, I do agree that we should scrutinize any type of filtering on RecordsRead to avoid it getting out of hand, am open to removing parentId if we decide that it's not as useful for the intended purpose.

Henry Tsai · Answer 7 · Mon Aug 21 2023 11:25:47 GMT+0800 (China Standard Time)

@csuwildcat and @LiranCohen, spec & design consideration: the current PR (#470) returns latest record when there are multiple children records thus relies on a query that can return messages in order of 100s or even 1000s, and is subject to sort/paging. What is scenario for needing this? If the ask is only to support cases when a protocol path contains only one record, is it okay for us to enforce that the protocol path being read contains only one record?

Liran Cohen · Answer 8 · Tue Aug 22 2023 03:23:41 GMT+0800 (China Standard Time)

@thehenrytsai from my perspective this is really for reading the latest record from any path regardless of how it's configured.

I view this as a cleaner way of doing Query + Read when you know you just need the single latest record + data for a path.

Would definitely like to get @csuwildcat's input into that.

Daniel Buchner · Answer 9 · Tue Aug 22 2023 04:59:09 GMT+0800 (China Standard Time)

I think the whole point here is to return the last record's data, as Read normally would, under a path if there is no recordId. I don't see any way this would result in unexpected behavior, and honestly, having it flip back and forth between working when there is 1 record and ceasing to work if there are 1+N seems like the strangest, most broken behavior of all.

Henry Tsai · Answer 10 · Tue Aug 22 2023 11:35:29 GMT+0800 (China Standard Time)

@csuwildcat and @LiranCohen:

having it flip back and forth between working when there is 1 record and ceasing to work if there are 1+N seems like the strangest, most broken behavior of all.

It's not broken at all if a singleton published Profile/Image record (or the likes) is the only scenario we are looking to support, this is the only scenario I was told. Hence, still looking for a straight answer on the scenarios that need the latest record when there are multiple records that:

belong to different parents
belong to the same parent

Should be trivial to answer if the need is there.

The behavior of current PR (if I read the code correctly):

Say path foo/bar has 1,000,000 bar RecordsWrites: when handling RecordsRead on foo/bar the code will fetch the entire 1,000,000 messages to the client-side, then find the latest message, then perform AuthZ. No one else is concerned that this being inefficient? If not why not? The "1 record" check is only my attempt at trying to help the PR to get to a minbar mergeable state without requiring sorting/paging which would be a much larger PR. If "1 record" is no go, that's fine too, please educate me.

Also what are the scenarios for reading unpublished latest record without recordId (which is currently allowed in PR)?

Say continuing from the above example, the above 1,000,000 bars all have a different foo parent, and the latest bar is NOT published, a RecordsRead would return the latest bar only if the requester happens to be the recipient, or the author, or satisfies the protocol/grant auth rules. Is this the intended behavior? If so, how so? Again, just looking for clarification on scenarios since there isn't a spec on this stuff.

Daniel Buchner · Answer 11 · Tue Aug 22 2023 12:03:23 GMT+0800 (China Standard Time)

@thehenrytsai the primary need is to support a read-by-path-in-context to get the file that resides at a given path within a context. This will allow basic DEST http queries that mirror traditional REST behavior on GET of a given path. To do this we'd need to return the latest file, as that's the implicit expectation of a GET on a singular path, like "/profile/avatar" would return 1 image binary payload in an HTTP body. I personally didn't care about curtailing the call to only work if there was a single file, because it just didn't seem to matter, but your point about performance is a good one. Do we have any type of query that will just get the latest record by last written?

Diane Huxley · Answer 12 · Tue Aug 22 2023 14:07:11 GMT+0800 (China Standard Time)

@csuwildcat Though I initially found the comparison to GET interesting, I'm convinced that returning the most recently updated record is a conflation of the object relational model with a way to "publish" data. If you're immovable on the idea of RecordsReading a protocolPath, we should explore that separate the two concerns. Off the top of my head: we could add a "highlighted" field to RecordsWrite, where records of a given protocolPath may only have one record with highlighted: true and the highlighted record is the one returned when reading by protocolPath. I'm sure there are even better solutions we could come up with if we start back from the scenarios you want to support.

Henry Tsai · Answer 13 · Tue Aug 22 2023 14:33:05 GMT+0800 (China Standard Time)

@csuwildcat,

read-by-path-in-context to get the file that resides at a given path within a context

Can you clarify above? I am probably interpreting the above incorrectly: the current PR does NOT support filtering by contextId in anyway, when protocol path is supplied in a RecordsRead, it fetches all records having that path across all contexts for the latest one, hence my final paragraph in previous comment that, unless the records are all "published", not everyone can read it (most probably can't), which is an odd behavior.

Daniel Buchner · Answer 14 · Tue Aug 22 2023 19:12:44 GMT+0800 (China Standard Time)

@thehenrytsai I misspoke about contextual specificity - it's just path-based. Yes, we want a protocol path query to return the latest file under it that is published.

Daniel Buchner · Answer 15 · Tue Aug 22 2023 19:26:22 GMT+0800 (China Standard Time)

I'm just going to restate the requirement/goal and let that determine the course: we need a Read that responds to a path-centric fetch the way a GET would pull THE file (notice the singular) at the path example.com/company/logo, such that the actual bytes of the logo image are returned from the invocation, not just the json metadata message. Users don't want to deal with any of the juggling, they want the file at paths like example.com/pages/home to return THE home page HTML file, and not being able to do so without contortions or multiple calls is a poor developer experience.

Liran Cohen · Answer 16 · Tue Aug 22 2023 22:29:03 GMT+0800 (China Standard Time)

I agree that this greatly improves developer experience.

Wrt performance, there is really no way to get around that until we add pagination to MessageStore. Even without this feature the user would perform a RecordsQuery on foo/bar which could return 1,000,000 records just to get the latest recordId and perform a RecordsRead to get the data.

@csuwildcat There is one remaining question/requirement that @diehuxx brought up during review.

When performing a RecordsRead on a path, do we expect to get the most recently CREATED or most recently UPDATED record?

Henry Tsai · Answer 17 · Wed Aug 23 2023 07:02:51 GMT+0800 (China Standard Time)

@csuwildcat, thanks for clarification!

we want a protocol path query to return the latest file under it that is published.

The current implementation also does not filter on published, as long as it is the latest, it gets returned.

Your reiteration of the feature goal seems to reaffirm my original understanding (unless there are further changes in today's office hour):

You are mainly interested in enabling the fetching of the record data of a particular protocol path that is expected to be published and singleton for a given protocol.

I am committed to support the scenario you described above and was reviewing the current PR with the above understanding, but am not interested in adding unclear/unspecified behavior beyond that, which is great and as it should be IMO.

@LiranCohen, you are right, having a proper limit on fetch is the way to go, everything else seems like a temporary bandaid or hack!

Diane Huxley · Answer 18 · Wed Aug 23 2023 08:16:11 GMT+0800 (China Standard Time)

One idea @csuwildcat and I tossed around in office hours today: Requiring parentId in addition to protocol + protocolPath.

Spec

A RecordsRead must have exactly one of the following

A recordId.
A protocol + protocolPath. If the protocolPath is a root record, then parentId is prohibited. Otherwise, parentId is required.

Rationale

A record cannot be uniquely identified by a protocolPath even if that record is a singleton*, and pulling the most recent record for a given protocolPath produces undesirable edge cases. We CAN uniquely identify a singleton record by its parentId and protocolPath.

*The current design for singleton AFAIU boils down to "one record of this type for a given parentId #467. Personally I like that design, but there's still active debate.

Henry Tsai · Answer 19 · Wed Aug 23 2023 13:41:58 GMT+0800 (China Standard Time)

Adding extra requirement from @csuwildcat to support/consider:

There is a desire to NOT require/allow parentId as long as the path leads to a global singleton (only one record in the entire protocol).

Two obvious ways to implementing this when recordId is not given:

Introduce and implement concept of limit in protocol configuration and fetch the protocol config. Verify that limit: 1 is specified in every layer of the hierarchy matching protocol path given in the RecordsRead.
Retrieve record directly from the store and confirm that the record count for each layer of the path is 1.

I think the implementor can decide the approach. But short term, approach 2 seems quicker to ship (not necessarily more performant), because it does not depend on yet another potentially large discussion/feature/PR ($limit).

Henry Tsai · Answer 20 · Thu Aug 24 2023 09:57:24 GMT+0800 (China Standard Time)

While I was putting myself to sleep last night thinking about this stuff, it occurred to me that a simple yet generalized spec for RecordsRead could be to:

Support the same filter as RecordsQuery would, but returning success (record + data) only if the query returns exactly 1 record. Error out otherwise.

This is rather intuitive to understand IMO, flexible, and would render discussion around parentId, latest record, limit etc mostly moot. It also seem to meet all use cases discussed so far.

Am I onto something, or I just need more sleep @csuwildcat, @diehuxx, @LiranCohen?

Liran Cohen · Answer 21 · Thu Aug 24 2023 22:39:35 GMT+0800 (China Standard Time)

I talked to @diehuxx yesterday and she caught me up on the latest ideas that were thrown around.

I'm still digesting the idea of failing if the record has more than one result.

From my understanding the main intent with having it fail is to prevent some sort of "foot gun" where a user gets back a result that wasn't what they intended to get. So having ANY result at all lets the user know that there was only 1 result to begin with.

I think this has some merit, it makes the intent really clear, but very limited.

So this would be useful for things like profile and profile/avatar, but less useful for things like game/score or stock/tick.

@thehenrytsai I tossed around that idea initially, of just allowing users to optionally include any filters that are available in RecordsQuery, I don't think that's really bad in any way... it gives users full control, I'm just still unsure about the failure aspect.

I do think in the 'real world' examples I've been running through my head, parentId seems to be most useful, where you could have performed a RecordsQuery a list of game, and then having the parentId of the game you want you can individually RecordsRead on demand the latest score of the game.

But that's not a hill I'm willing to die on, just still digesting the idea of failing if there is more than just a single record and what use-cases that satisfies.

Henry Tsai · Answer 22 · Sat Sep 09 2023 03:20:16 GMT+0800 (China Standard Time)

Some might not believe it but this is now done by #470!!