tokio-rs / tokio-service

The core `Service` trait in Tokio and support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Service trait and lifetimes

withoutboats opened this issue · comments

I was messing around with abstractions on top of Service, and I'm not sure I understand it but it seems to me that there may be a rather deep problem with the API.

(This is basically an ellaboration of the fears I've held for a while that without impl Trait in traits or at least ATCs, futures will not really work.)

The high level problem is this: Service::Future is essentially required to have no relationship in lifetime to the borrow of self in call. A definition that linked those two lifetimes would require associated type constructors.

This means that you cannot borrow self at any asynchronous point during the service, only while constructing the future. This seems bad!

Consider this simple service combinator, which chains two services together:

struct ServiceChain<S1, S2> {
    first: S1,
    second: S2,
}

impl<S1, S2> Service for ServiceChain<S1, S2>
where
    S1: Service + Sync + 'static,
    S2: Service<Request = S1::Response, Error = S1::Error> + Sync + 'static,
    S1::Future: Send,
    S2::Future: Send,
{
    type Request = S1::Request;
    type Response = S2::Response;
    type Error = S1::Error;
    type Future = futures::future::BoxFuture<Self::Response, S1::Error>;

    fn call(&self, request: Self::Request) -> Self::Future {
        self.first.call(request)
                  .and_then(move |intermediate| self.second.call(intermediate))
                  .boxed()
    }
}

Is it intentional that Service's future type is defined so that it could outlive the service being borrowed? Is a service supposed to have a method that constructs the type needed to process the request (separately for each request) and passes it into the future?

Chaining futures and streams can suffer from a sort of related problem which I was forced to solved with reference counting.

This seems like the kind of problem the NewService trait should solve? Instead of requiring a reference to self with an appropriate lifetime to access the service, you just call the associated factory method that gives you an instance with the right lifetime.

Otherwise you could return a tuple of the result from the first future and a copy of the service that consumes the result. But copying sucks.

So after all that it seems like being able to chain that lifetime through the future chain you're constructing is the best possible way to go.

I'm also holding my breath for impl Trait to make futures generally usable without boxing.

NewService is a viable approach, but I'm not clear if its intended to be the approach. My impression was that NewService was a potentially expensive operation which would be called e.g. when your server boots, not something that's called once for every request.

This definition of Service would behave how I would expect it to & compiles today, but it forces a dynamic dispatch at every middleware level (in practice, I fear that this will be how tokio is used regardless until we have impl Trait in traits):

trait Service {
     type Request;
     type Response;
     type Error;
     fn call<'a>(&'a self, request: Self::Request)
         -> Box<Future<Item = Self::Response, Error = Self::Error> + Send + 'a>;
}

Could you elaborate more on the case that required the response future to borrow state from the service?

To clarify some things, it is expected that the response future should not be bound to the service value itself. Some reasons for this would be to allow the service value to be shared across threads (Sync) and for the response future to be able to move across threads (to thread pools, other executors, etc...)

The current rule of thumb is that a service value represents a single connection. So NewService will be called once per connection. This isn't a hard rule given that connection pools etc abstract over the connection.

Anyway, the short version is that the decoupling of the response and the service is intentional to push towards an "higher level" use case. If you could share more about your specific case, that would help me understand your constraints better.

To clarify some things, it is expected that the response future should not be bound to the service value itself. Some reasons for this would be to allow the service value to be shared across threads (Sync) and for the response future to be able to move across threads (to thread pools, other executors, etc...)

Though you'd have to deal with scoping, if the Service is Sync, a reference to it is Send, so it doesn't preclude these.

If you could share more about your specific case, that would help me understand your constraints better.

Any middleware which performs an asynchronous operation before or after calling the wrapped service will have this problem. The ServiceChain is a very abstract example, but this seems extremely common. I don't have a specific use case myself because I'm writing a framework. But how about an Auth middleware, that authenticates the user somehow before calling into the wrapped service?

This precludes using self in any then, and_then, or_else, map, or map_err combinator. My impression is that the timeout middleware in the docs is the exception, rather than the rule.

There has been a bunch of discussion about this in Gitter. We should probably take it to Gitter / IRC to discuss this more. Ping me in IRC / Gitter and we can try to suss out the issues (though I may not be super responsive this week).

@withoutboats Incidentally, @sgrif brought up exactly the same issue previously.

Note that, in order for the future to have access to borrowed data from &self, it cannot be 'static. That means, in particular, that the data in self must outlive the future. Making this work requires something akin to scoped threads for futures -- which is something we want to (and know how to) provide, but haven't built into the core library yet. (@nikomatsakis has been working on a Rayon integration that does this.)

So, in principle, we could move to a model that threaded through the lifetime of self for Service, and allow "pinning" the data outside of the event loop.

However, when we worked through all this in the past, the basic feeling was that the complexity wasn't worth it; an Arc is almost certainly good enough, especially since it would only need to be cloned once per service instantiation. In many cases, it might even be reasonable to use purposeful leakage, converting a Box<T> to a &'static T, if the entire app is a server that will keep the boxed data alive for its entire duration anyway.

Hope that helps. This is certainly a point we may want to revisit. But right now I'd reach for Arc and friends, and see how far that goes.

I also made the connection to scoped threads. :-)

However, when we worked through all this in the past, the basic feeling was that the complexity wasn't worth it; an Arc is almost certainly good enough, especially since it would only need to be cloned once per service instantiation.

My impression is that the Arcs need to be cloned for every time the service is called, because they'll need to be cloned as a part of constructing the future the service returns. Am I mistaken?

That is correct, they would need to be cloned each time the service is called. That being said, I would be surprised if it resulted in a huge perf hit. Another option would be to use Rc and force the service handle to be pinned to a single thread...

I would also be surprised if its a huge performance hit. I think my feelings are largely puritanical rather than practical, and (until this becomes easier to implement) y'all have made the right call for now.

A sort of related thing I've been thinking about though is this trait.

trait Middleware<S: Service> {
    type Request;
    type Response;
    type Error;
    fn call(&self, service: &S, request: Self::Request) -> impl Future<Self::Response, Self::Error>;
}

Combined with adding a default method to service:

   fn wrap<M: Middleware<Self>(self, middleware: M) -> WrappedService<Self, M> { ... }

struct WrappedService<S: Service, M: Middleware<S>> {
   service: S,
   middleware: M,
}

And WrappedService would implement Service by passing S to M::call. It would then be very easy to build up middleware chains:

   some_service.wrap(some_middleware).wrap(some_middleware).wrap(some_middleware)

If we're going to have to Arc inner services, though, I think middleware & wrapped service would have to look like this:

trait Middleware<S: Service> {
    type Request;
    type Response;
    type Error;
    type Future: Future<Item = Self::Request, Error = Self::Error>;
    fn call(&self, service: &Arc<S>, request: Self::Request) -> Self::Future;
}

struct WrappedService<S: Service, M: Middleware<S> {
    service: Arc<S>,
    middleware: M,
}

(The Arc is passed by reference so you could clone or not as necessary for implementing your middleware).

I do believe that we will have some more specialized traits sooner than later. I haven't thought too much about the specifics of those yet.

#13 changed Service::call to take &mut self. That has significant implications for this issue.

There are multiple ways to combine two services, but the specific combination at issue is sequentially chaining two asynchronous calls. So this is rather narrowly focused on that question.

It seems like the example in the first comment of this issue should now be rewritten to use a Mutex:

struct ServiceChain<S1, S2> {
    first: S1,
    second: Arc<Mutex<S2>>,
}

impl<S1, S2> Service for ServiceChain<S1, S2>
where
    ...
{
    ...

    fn call(&mut self, request: Self::Request) -> Self::Future {
        let second = self.second.clone();
        self.first.call(request)
                  .and_then(move |intermediate| second.lock().unwrap().call(intermediate))
                  .boxed()
    }
}

This compiles, but I'm not certain it's correct. To what extent does this Mutex introduce synchronization between multiple calls to this service?

Beyond the immediate practical question, it seems to have gone uncommented that #13 forecloses on this possibility:

That means, in particular, that the data in self must outlive the future. Making this work requires something akin to scoped threads for futures -- which is something we want to (and know how to) provide, but haven't built into the core library yet.

If the lifetime of Service::Future were tied to the lifetime of the borrow of self in Service::call, that would mean you could not call this service again until that future is completed; within a connection every call would become synchronous. This wasn't an issue with an immutable borrow.

I'm still confused (even concerned) about this issue. Is it correct now to wrap services in a Mutex when you need to chain their responses? Is the potential "scoped threads for futures" still achievable with &mut self services?

I think chaining the futures returned by services is going to be overwhelmingly common for high level clients of tokio. I feel a disconnect around this issue - the responses from y'all suggest you think this is sort of a niche pattern.

Not sure if it is relevant, but here is my integration of futures into Rayon:

https://github.com/nikomatsakis/rayon/pull/193

This permits you to do things like:

rayon::scope(|s| {
    s.spawn_future(...some future...);
});

where the future can use data which outlives the call to rayon::scope().

@withoutboats Thanks much for raising these concerns (and sorry for the slow response).

While I didn't feel too concerned about requiring an Arc for your original chaining example, you make some very good points about &mut self that we indeed did not consider when making the change. Requiring a mutex for chaining seems very bad to me.

FWIW I completely agree with you that chaining and otherwise adding additional async code around services is going to be very common, and hence this is a serious issue we need to think carefully about. I'm going to dig into the tradeoffs around &mut self a bit more, and possibly propose reverting that change.

Note: we are rolling back the &mut self change for 0.1, given the discussion above. While there's probably more work to do here to figure out a decent synchronization strategy with services and futures, this seems to be the most conservative choice for now.

What's the status of this? Is it just blocked on ATCs at this point?

@cramertj I think it may be more complicated than that; IIRC many of the executor APIs are not designed to run non-static futures, so this wouldn't necessarily be useful for many cases.