Tweak reads to not require a recordId, and just return the latest created record matching a query filter
csuwildcat opened this issue · comments
We could tweak Reads to not require a recordId, such that if someone passed a protocol/protocolPath it would just return the latest record to be added for that indicated filter.
What's the use case for this? This is not an intuitive behavior for a Read
to me. This feels like a specific query.
Maybe we're looking for a limit
option on RecordsQuery
?
The use case is so that if I have a record type avatar
under a Profile protocol, I can do a Read of Profile Protocol + the avatar
protocolPath and know I am getting the bytes for the singular latest record under that bucket.
Does having limit: 1
or possibly latest: true
on queries solve this? If this could be solved with a small addition to RecordsQuery, I don't think we should add it to RecordsRead. My further worry is that it will set a precedent for other query-like parameters on read, muddying the difference between read and query.
@LiranCohen's notes from #470 show that this is already the intention.
There is a potential to add additional fields to the description such as author, recipient, published etc for further filtering here.
I will add contextId as an option when doing a Read based on protocol + protocolPath
+1 to @diehuxx 's comments
Modifying RecordsRead
in this way seems like an attempt to work around a limitation in RecordsQuery
that ought to be addressed.
This use case also causes me to wonder: If we're adding query parameters to RecordsQuery
, should we consider a parameter that returns the data payload regardless of the size if limit: 1
?
I shared similar sentiment as @frankhinek and @diehuxx, this seems like programming sugar to me, it also feels like we are attempting to workaround the design decision of generated recordId
to mimic the benefit of predefined record ID.
We had some discussion around this during office hours and mentioned that RecordsRead
is almost analogous to an HTTP GET req, which fits my mental model as well.
I think the main limitation of using RecordsQuery
is not necessarily limiting it to a single record, but rather actually reading the data that comes with that record as @frankhinek pointed out.
Maybe it would be worthwhile to discuss what type of improvements we can make to DataStore in order to support something like this, as we would need to allow streaming of large data with RecordsQuery
. If that ends up being the case, it almost seems to eliminate any need for RecordsRead
altogether.
I did add the ability for only a parentId
to be passed as an additional parameter for the RecordsRead
, as I thought it would be useful to read the latest record in a path you know the parent of, ie. game/score
where you know the specific game you are looking for.
But, I do agree that we should scrutinize any type of filtering on RecordsRead
to avoid it getting out of hand, am open to removing parentId
if we decide that it's not as useful for the intended purpose.
@csuwildcat and @LiranCohen, spec & design consideration: the current PR (#470) returns latest record when there are multiple children records thus relies on a query that can return messages in order of 100s or even 1000s, and is subject to sort/paging. What is scenario for needing this? If the ask is only to support cases when a protocol path contains only one record, is it okay for us to enforce that the protocol path being read contains only one record?
@thehenrytsai from my perspective this is really for reading the latest record from any path regardless of how it's configured.
I view this as a cleaner way of doing Query + Read when you know you just need the single latest record + data for a path.
Would definitely like to get @csuwildcat's input into that.
I think the whole point here is to return the last record's data, as Read normally would, under a path if there is no recordId. I don't see any way this would result in unexpected behavior, and honestly, having it flip back and forth between working when there is 1 record and ceasing to work if there are 1+N seems like the strangest, most broken behavior of all.
@csuwildcat and @LiranCohen:
having it flip back and forth between working when there is 1 record and ceasing to work if there are 1+N seems like the strangest, most broken behavior of all.
It's not broken at all if a singleton published Profile/Image record (or the likes) is the only scenario we are looking to support, this is the only scenario I was told. Hence, still looking for a straight answer on the scenarios that need the latest record when there are multiple records that:
- belong to different parents
- belong to the same parent
Should be trivial to answer if the need is there.
The behavior of current PR (if I read the code correctly):
Say path foo/bar
has 1,000,000 bar
RecordsWrites
: when handling RecordsRead
on foo/bar
the code will fetch the entire 1,000,000 messages to the client-side, then find the latest message, then perform AuthZ. No one else is concerned that this being inefficient? If not why not? The "1 record" check is only my attempt at trying to help the PR to get to a minbar mergeable state without requiring sorting/paging which would be a much larger PR. If "1 record" is no go, that's fine too, please educate me.
Also what are the scenarios for reading unpublished latest record without recordId
(which is currently allowed in PR)?
Say continuing from the above example, the above 1,000,000 bars
all have a different foo
parent, and the latest bar
is NOT published, a RecordsRead
would return the latest bar
only if the requester happens to be the recipient, or the author, or satisfies the protocol/grant auth rules. Is this the intended behavior? If so, how so? Again, just looking for clarification on scenarios since there isn't a spec on this stuff.
@thehenrytsai the primary need is to support a read-by-path-in-context to get the file that resides at a given path within a context. This will allow basic DEST http queries that mirror traditional REST behavior on GET of a given path. To do this we'd need to return the latest file, as that's the implicit expectation of a GET on a singular path, like "/profile/avatar" would return 1 image binary payload in an HTTP body. I personally didn't care about curtailing the call to only work if there was a single file, because it just didn't seem to matter, but your point about performance is a good one. Do we have any type of query that will just get the latest record by last written?
@csuwildcat Though I initially found the comparison to GET interesting, I'm convinced that returning the most recently updated record is a conflation of the object relational model with a way to "publish" data. If you're immovable on the idea of RecordsReading a protocolPath
, we should explore that separate the two concerns. Off the top of my head: we could add a "highlighted" field to RecordsWrite
, where records of a given protocolPath
may only have one record with highlighted: true
and the highlighted
record is the one returned when reading by protocolPath
. I'm sure there are even better solutions we could come up with if we start back from the scenarios you want to support.
read-by-path-in-context to get the file that resides at a given path within a context
Can you clarify above? I am probably interpreting the above incorrectly: the current PR does NOT support filtering by contextId
in anyway, when protocol path is supplied in a RecordsRead
, it fetches all records having that path across all contexts for the latest one, hence my final paragraph in previous comment that, unless the records are all "published", not everyone can read it (most probably can't), which is an odd behavior.
@thehenrytsai I misspoke about contextual specificity - it's just path-based. Yes, we want a protocol path query to return the latest file under it that is published.
I'm just going to restate the requirement/goal and let that determine the course: we need a Read that responds to a path-centric fetch the way a GET would pull THE file (notice the singular) at the path example.com/company/logo
, such that the actual bytes of the logo image are returned from the invocation, not just the json metadata message. Users don't want to deal with any of the juggling, they want the file at paths like example.com/pages/home
to return THE home page HTML file, and not being able to do so without contortions or multiple calls is a poor developer experience.
I agree that this greatly improves developer experience.
Wrt performance, there is really no way to get around that until we add pagination to MessageStore. Even without this feature the user would perform a RecordsQuery
on foo/bar
which could return 1,000,000 records just to get the latest recordId
and perform a RecordsRead
to get the data.
@csuwildcat There is one remaining question/requirement that @diehuxx brought up during review.
When performing a RecordsRead
on a path, do we expect to get the most recently CREATED or most recently UPDATED record?
@csuwildcat, thanks for clarification!
we want a protocol path query to return the latest file under it that is published.
The current implementation also does not filter on published
, as long as it is the latest, it gets returned.
Your reiteration of the feature goal seems to reaffirm my original understanding (unless there are further changes in today's office hour):
You are mainly interested in enabling the fetching of the record data of a particular protocol path that is expected to be published and singleton for a given protocol.
I am committed to support the scenario you described above and was reviewing the current PR with the above understanding, but am not interested in adding unclear/unspecified behavior beyond that, which is great and as it should be IMO.
@LiranCohen, you are right, having a proper limit
on fetch is the way to go, everything else seems like a temporary bandaid or hack!
One idea @csuwildcat and I tossed around in office hours today: Requiring parentId
in addition to protocol
+ protocolPath
.
Spec
A RecordsRead
must have exactly one of the following
- A
recordId
. - A
protocol
+protocolPath
. If theprotocolPath
is a root record, thenparentId
is prohibited. Otherwise,parentId
is required.
Rationale
A record cannot be uniquely identified by a protocolPath
even if that record is a singleton
*, and pulling the most recent record for a given protocolPath
produces undesirable edge cases. We CAN uniquely identify a singleton record by its parentId
and protocolPath
.
*The current design for singleton
AFAIU boils down to "one record of this type for a given parentId
#467. Personally I like that design, but there's still active debate.
Adding extra requirement from @csuwildcat to support/consider:
There is a desire to NOT require/allow parentId
as long as the path leads to a global singleton (only one record in the entire protocol).
Two obvious ways to implementing this when recordId
is not given:
- Introduce and implement concept of
limit
in protocol configuration and fetch the protocol config. Verify thatlimit: 1
is specified in every layer of the hierarchy matching protocol path given in theRecordsRead
. - Retrieve record directly from the store and confirm that the record count for each layer of the path is 1.
I think the implementor can decide the approach. But short term, approach 2 seems quicker to ship (not necessarily more performant), because it does not depend on yet another potentially large discussion/feature/PR ($limit
).
While I was putting myself to sleep last night thinking about this stuff, it occurred to me that a simple yet generalized spec for RecordsRead
could be to:
Support the same filter as RecordsQuery
would, but returning success (record + data) only if the query returns exactly 1 record. Error out otherwise.
This is rather intuitive to understand IMO, flexible, and would render discussion around parentId
, latest record, limit etc mostly moot. It also seem to meet all use cases discussed so far.
Am I onto something, or I just need more sleep @csuwildcat, @diehuxx, @LiranCohen?
I talked to @diehuxx yesterday and she caught me up on the latest ideas that were thrown around.
I'm still digesting the idea of failing if the record has more than one result.
From my understanding the main intent with having it fail is to prevent some sort of "foot gun" where a user gets back a result that wasn't what they intended to get. So having ANY result at all lets the user know that there was only 1 result to begin with.
I think this has some merit, it makes the intent really clear, but very limited.
So this would be useful for things like profile
and profile/avatar
, but less useful for things like game/score
or stock/tick
.
@thehenrytsai I tossed around that idea initially, of just allowing users to optionally include any filters that are available in RecordsQuery
, I don't think that's really bad in any way... it gives users full control, I'm just still unsure about the failure aspect.
I do think in the 'real world' examples I've been running through my head, parentId
seems to be most useful, where you could have performed a RecordsQuery
a list of game
, and then having the parentId
of the game you want you can individually RecordsRead
on demand the latest score of the game.
But that's not a hill I'm willing to die on, just still digesting the idea of failing if there is more than just a single record and what use-cases that satisfies.
Some might not believe it but this is now done by #470!!