Service trait and lifetimes

Question

Service trait and lifetimes

withoutboats opened this issue 8 years ago · comments

I was messing around with abstractions on top of Service, and I'm not sure I understand it but it seems to me that there may be a rather deep problem with the API.

(This is basically an ellaboration of the fears I've held for a while that without impl Trait in traits or at least ATCs, futures will not really work.)

The high level problem is this: Service::Future is essentially required to have no relationship in lifetime to the borrow of self in call. A definition that linked those two lifetimes would require associated type constructors.

This means that you cannot borrow self at any asynchronous point during the service, only while constructing the future. This seems bad!

Consider this simple service combinator, which chains two services together:

struct ServiceChain<S1, S2> {
    first: S1,
    second: S2,
}

impl<S1, S2> Service for ServiceChain<S1, S2>
where
    S1: Service + Sync + 'static,
    S2: Service<Request = S1::Response, Error = S1::Error> + Sync + 'static,
    S1::Future: Send,
    S2::Future: Send,
{
    type Request = S1::Request;
    type Response = S2::Response;
    type Error = S1::Error;
    type Future = futures::future::BoxFuture<Self::Response, S1::Error>;

    fn call(&self, request: Self::Request) -> Self::Future {
        self.first.call(request)
                  .and_then(move |intermediate| self.second.call(intermediate))
                  .boxed()
    }
}

Is it intentional that Service's future type is defined so that it could outlive the service being borrowed? Is a service supposed to have a method that constructs the type needed to process the request (separately for each request) and passes it into the future?

Chaining futures and streams can suffer from a sort of related problem which I was forced to solved with reference counting.

Ashley Mannix · Answer 1 · Sun Dec 04 2016 07:59:36 GMT+0800 (China Standard Time)

This seems like the kind of problem the NewService trait should solve? Instead of requiring a reference to self with an appropriate lifetime to access the service, you just call the associated factory method that gives you an instance with the right lifetime.

Otherwise you could return a tuple of the result from the first future and a copy of the service that consumes the result. But copying sucks.

So after all that it seems like being able to chain that lifetime through the future chain you're constructing is the best possible way to go.

I'm also holding my breath for impl Trait to make futures generally usable without boxing.

srrrse · Answer 2 · Sun Dec 04 2016 10:20:32 GMT+0800 (China Standard Time)

NewService is a viable approach, but I'm not clear if its intended to be the approach. My impression was that NewService was a potentially expensive operation which would be called e.g. when your server boots, not something that's called once for every request.

srrrse · Answer 3 · Sun Dec 04 2016 12:32:57 GMT+0800 (China Standard Time)

This definition of Service would behave how I would expect it to & compiles today, but it forces a dynamic dispatch at every middleware level (in practice, I fear that this will be how tokio is used regardless until we have impl Trait in traits):

trait Service {
     type Request;
     type Response;
     type Error;
     fn call<'a>(&'a self, request: Self::Request)
         -> Box<Future<Item = Self::Response, Error = Self::Error> + Send + 'a>;
}

Carl Lerche · Answer 4 · Mon Dec 05 2016 02:25:07 GMT+0800 (China Standard Time)

Could you elaborate more on the case that required the response future to borrow state from the service?

To clarify some things, it is expected that the response future should not be bound to the service value itself. Some reasons for this would be to allow the service value to be shared across threads (Sync) and for the response future to be able to move across threads (to thread pools, other executors, etc...)

The current rule of thumb is that a service value represents a single connection. So NewService will be called once per connection. This isn't a hard rule given that connection pools etc abstract over the connection.

Anyway, the short version is that the decoupling of the response and the service is intentional to push towards an "higher level" use case. If you could share more about your specific case, that would help me understand your constraints better.

srrrse · Answer 5 · Mon Dec 05 2016 04:56:28 GMT+0800 (China Standard Time)

To clarify some things, it is expected that the response future should not be bound to the service value itself. Some reasons for this would be to allow the service value to be shared across threads (Sync) and for the response future to be able to move across threads (to thread pools, other executors, etc...)

Though you'd have to deal with scoping, if the Service is Sync, a reference to it is Send, so it doesn't preclude these.

If you could share more about your specific case, that would help me understand your constraints better.

Any middleware which performs an asynchronous operation before or after calling the wrapped service will have this problem. The ServiceChain is a very abstract example, but this seems extremely common. I don't have a specific use case myself because I'm writing a framework. But how about an Auth middleware, that authenticates the user somehow before calling into the wrapped service?

This precludes using self in any then, and_then, or_else, map, or map_err combinator. My impression is that the timeout middleware in the docs is the exception, rather than the rule.

Carl Lerche · Answer 6 · Mon Dec 05 2016 07:37:19 GMT+0800 (China Standard Time)

There has been a bunch of discussion about this in Gitter. We should probably take it to Gitter / IRC to discuss this more. Ping me in IRC / Gitter and we can try to suss out the issues (though I may not be super responsive this week).

Aaron Turon · Answer 7 · Wed Dec 14 2016 02:10:16 GMT+0800 (China Standard Time)

@withoutboats Incidentally, @sgrif brought up exactly the same issue previously.

Note that, in order for the future to have access to borrowed data from &self, it cannot be 'static. That means, in particular, that the data in self must outlive the future. Making this work requires something akin to scoped threads for futures -- which is something we want to (and know how to) provide, but haven't built into the core library yet. (@nikomatsakis has been working on a Rayon integration that does this.)

So, in principle, we could move to a model that threaded through the lifetime of self for Service, and allow "pinning" the data outside of the event loop.

However, when we worked through all this in the past, the basic feeling was that the complexity wasn't worth it; an Arc is almost certainly good enough, especially since it would only need to be cloned once per service instantiation. In many cases, it might even be reasonable to use purposeful leakage, converting a Box<T> to a &'static T, if the entire app is a server that will keep the boxed data alive for its entire duration anyway.

Hope that helps. This is certainly a point we may want to revisit. But right now I'd reach for Arc and friends, and see how far that goes.

srrrse · Answer 8 · Wed Dec 14 2016 03:22:58 GMT+0800 (China Standard Time)

I also made the connection to scoped threads. :-)

However, when we worked through all this in the past, the basic feeling was that the complexity wasn't worth it; an Arc is almost certainly good enough, especially since it would only need to be cloned once per service instantiation.

My impression is that the Arcs need to be cloned for every time the service is called, because they'll need to be cloned as a part of constructing the future the service returns. Am I mistaken?

Carl Lerche · Answer 9 · Wed Dec 14 2016 06:14:15 GMT+0800 (China Standard Time)

That is correct, they would need to be cloned each time the service is called. That being said, I would be surprised if it resulted in a huge perf hit. Another option would be to use Rc and force the service handle to be pinned to a single thread...

srrrse · Answer 10 · Wed Dec 14 2016 08:58:39 GMT+0800 (China Standard Time)

I would also be surprised if its a huge performance hit. I think my feelings are largely puritanical rather than practical, and (until this becomes easier to implement) y'all have made the right call for now.

A sort of related thing I've been thinking about though is this trait.

trait Middleware<S: Service> {
    type Request;
    type Response;
    type Error;
    fn call(&self, service: &S, request: Self::Request) -> impl Future<Self::Response, Self::Error>;
}

Combined with adding a default method to service:

   fn wrap<M: Middleware<Self>(self, middleware: M) -> WrappedService<Self, M> { ... }

struct WrappedService<S: Service, M: Middleware<S>> {
   service: S,
   middleware: M,
}

And WrappedService would implement Service by passing S to M::call. It would then be very easy to build up middleware chains:

   some_service.wrap(some_middleware).wrap(some_middleware).wrap(some_middleware)

If we're going to have to Arc inner services, though, I think middleware & wrapped service would have to look like this:

trait Middleware<S: Service> {
    type Request;
    type Response;
    type Error;
    type Future: Future<Item = Self::Request, Error = Self::Error>;
    fn call(&self, service: &Arc<S>, request: Self::Request) -> Self::Future;
}

struct WrappedService<S: Service, M: Middleware<S> {
    service: Arc<S>,
    middleware: M,
}

(The Arc is passed by reference so you could clone or not as necessary for implementing your middleware).

Carl Lerche · Answer 11 · Sat Dec 17 2016 00:28:05 GMT+0800 (China Standard Time)

I do believe that we will have some more specialized traits sooner than later. I haven't thought too much about the specifics of those yet.

srrrse · Answer 12 · Sun Dec 25 2016 08:33:14 GMT+0800 (China Standard Time)

#13 changed Service::call to take &mut self. That has significant implications for this issue.

There are multiple ways to combine two services, but the specific combination at issue is sequentially chaining two asynchronous calls. So this is rather narrowly focused on that question.

It seems like the example in the first comment of this issue should now be rewritten to use a Mutex:

struct ServiceChain<S1, S2> {
    first: S1,
    second: Arc<Mutex<S2>>,
}

impl<S1, S2> Service for ServiceChain<S1, S2>
where
    ...
{
    ...

    fn call(&mut self, request: Self::Request) -> Self::Future {
        let second = self.second.clone();
        self.first.call(request)
                  .and_then(move |intermediate| second.lock().unwrap().call(intermediate))
                  .boxed()
    }
}

This compiles, but I'm not certain it's correct. To what extent does this Mutex introduce synchronization between multiple calls to this service?

Beyond the immediate practical question, it seems to have gone uncommented that #13 forecloses on this possibility:

That means, in particular, that the data in self must outlive the future. Making this work requires something akin to scoped threads for futures -- which is something we want to (and know how to) provide, but haven't built into the core library yet.

If the lifetime of Service::Future were tied to the lifetime of the borrow of self in Service::call, that would mean you could not call this service again until that future is completed; within a connection every call would become synchronous. This wasn't an issue with an immutable borrow.

srrrse · Answer 13 · Sun Jan 01 2017 06:35:42 GMT+0800 (China Standard Time)

I'm still confused (even concerned) about this issue. Is it correct now to wrap services in a Mutex when you need to chain their responses? Is the potential "scoped threads for futures" still achievable with &mut self services?

I think chaining the futures returned by services is going to be overwhelmingly common for high level clients of tokio. I feel a disconnect around this issue - the responses from y'all suggest you think this is sort of a niche pattern.

Niko Matsakis · Answer 14 · Mon Jan 02 2017 22:54:56 GMT+0800 (China Standard Time)

Not sure if it is relevant, but here is my integration of futures into Rayon:

https://github.com/nikomatsakis/rayon/pull/193

This permits you to do things like:

rayon::scope(|s| {
    s.spawn_future(...some future...);
});

where the future can use data which outlives the call to rayon::scope().

Aaron Turon · Answer 15 · Fri Jan 06 2017 02:48:23 GMT+0800 (China Standard Time)

@withoutboats Thanks much for raising these concerns (and sorry for the slow response).

While I didn't feel too concerned about requiring an Arc for your original chaining example, you make some very good points about &mut self that we indeed did not consider when making the change. Requiring a mutex for chaining seems very bad to me.

FWIW I completely agree with you that chaining and otherwise adding additional async code around services is going to be very common, and hence this is a serious issue we need to think carefully about. I'm going to dig into the tradeoffs around &mut self a bit more, and possibly propose reverting that change.

Aaron Turon · Answer 16 · Tue Jan 10 2017 05:57:10 GMT+0800 (China Standard Time)

Note: we are rolling back the &mut self change for 0.1, given the discussion above. While there's probably more work to do here to figure out a decent synchronization strategy with services and futures, this seems to be the most conservative choice for now.

Taylor Cramer · Answer 17 · Thu Jun 22 2017 13:29:32 GMT+0800 (China Standard Time)

What's the status of this? Is it just blocked on ATCs at this point?

srrrse · Answer 18 · Thu Jun 22 2017 13:49:55 GMT+0800 (China Standard Time)

@cramertj I think it may be more complicated than that; IIRC many of the executor APIs are not designed to run non-static futures, so this wouldn't necessarily be useful for many cases.