microsoft / Trill

Trill is a single-node query processor for temporal or streaming data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provider model for IStreamable

cybertyche opened this issue · comments

Requirement 1: Create an analogue to IQueryable or IQbservable for IStreamable.

That means having a new interface, IQStreamable, and a provider class, IStreamProvider, that can build up an expression in LINQ and then evaluate it later. The expression is inspectible and manipulable just like other LINQ expressions.

Requirement 2: Ensure that the provider can stand independently of the engine.

Not only would this allow someone to implement an engine to the Trill API with different characteristics, but it would also allow the API implementation to evolve independently of the API if it needs to.

Biggest open question: Do we precisely mirror the existing IStreamable API, or do we instead try to implement API 2.0 with lessons learned?

For the most part, we are incredibly proud of our API. However, there are a few places where we would make changes if we could. Those places are:

  • Making IQStreamable have one type parameter instead of two
  • Consequently, also having a group-by syntax that is actually identical to IQueryable instead of GroupApply()
  • No longer requiring QueryContainer to register output, only input

Intent

Perhaps we should outline the intent of introducing IQStreamable. Are there specific use cases we're trying to satisfy right from the start? Any specific scope? Does it necessarily include things like a client library, serializing the expression tree, etc.?

History

I was wondering if there was already a provider implementation available from one of the product groups using Trill (like Stream Analytics)? Perhaps they're not generic enough, but something battle-tested like that might serve as good reference. Even if us external folks can't see it, perhaps you could review or interview and share any insights they had.

Open Questions

Do we precisely mirror the existing IStreamable API, or do we instead try to implement API 2.0 with lessons learned?

I would say mirroring the existing API (keeping it homomorphic—am I using that term right?) would allow developers to seamlessly switch between code and code-as-data modes of using Trill. It would be delightful for IStreamable/IQStreamable to more closely match IEnumerable/IQueryable and IObservable/IQbservable (no more TKey, introduce a group by operator, etc.) but if that's not happening any time soon for IStreamable, perhaps it's not worth the divergence for IQStreamable.

Other Questions

Where would you like feedback on specific changes? On the commit, on a new PR for the provider branch (which you'd open), or inline here?

For example…

Subscriptions

I was going to ask if IQStreamable<> ought to inherit from IStreamable<> like IQueryable<> inherits from IEnumerable<> and IQbservable<> inherits from IObservable<>? Which allows for a developer experience like:

IQStreamable<int> subject = ctx.Subject<int>("my-counter");
IQStreamable<int> query   = subject.Where(val=> val % 2 == 0);
IDisposable subscription  = query.Subscribe(val => Console.WriteLine($"Got an even: {val}"));

Otherwise, what were your thoughts how the subscription API might look?

<tangent>
Though as I write this now I'm thinking things, what if we instead have a SubscribeAsync (and DisposeAsync, I guess) as perhaps these subscriptions occur across device boundaries (client and server) and we're dealing with network latency? And what about separation of 'making' the subscription (registering the query) and 'starting' the subscription (start processing values using the standing query) so that clients have better control over lifetime?
…but perhaps these considerations needs to be part of API 2.0, not this implementation of IQStreamable… Perhaps clarifying the intent of this first implementation might enlighten us to whether we should go this far.
</tangent>

A few answers:

Q: Will the IQStreamable and IStreamable APIs match?
A: Yes - the question is whether IQStreamable matches IStreamable in its current form or if we use IQStreamable to get feedback on a new API and then add IStreamable<P> to match. The future steady state will definitely not be a situation where IQStreamable and IStreamable are different. The input we're getting is that the two-type-parameter IStreamable<K, P> is cumbersome, and having full LINQ comprehension syntax support is good, meaning we need GroupBy. That said, there are existing customers, so this would be a pretty big breaking change in the long run. The transitional plan would likely be to deprecate IStreamable<K, P> upon implementing and presenting IStreamable<P>.

Q: Is there an already-existing provider somewhere within Microsoft that we could learn from?
A: Nope - or else we wouldn't be doing this from scratch. :-)

Q: Will IQStreamable inherit from IStreamable?
A: Not sure yet. This is a distinct possibility but not certain. The benefits seem to be mostly around egress, but given that egress in Trill is not quite as trivial as it is with IEnumerable or IObservable, it's not so clearcut.

Q: Regarding SubscribeAsync etc.
A: Given that Trill is just a single-node engine, this has been low on the priorities. More common is people asking for more thread safety within the engine itself. The "making" and "starting" the query separation you mentioned is already present if you use a QueryContainer object for high-availability, as you have a separate "Restore" step that begins execution.