apple / swift-distributed-actors

Peer-to-peer cluster implementation for Swift Distributed Actors

Home Page:https://apple.github.io/swift-distributed-actors/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Convert SWIMActor.sendPing to use Swift Concurrency (to avoid NIO shutdown errors)

ktoso opened this issue · comments

It is a bit painful to prevent a periodically sending things actor mid-operation if a shutdown happens during it's execution.

Specifically code like this in SWIMActor:

    internal func sendPingRequests(_ directive: SWIMInstance.SendPingRequestDirective) async {
        // We are only interested in successful pings, as a single success tells us the node is
        // still alive. Therefore we propagate only the first success, but no failures.
        // The failure case is handled through the timeout of the whole operation.
        let eventLoop = self.actorSystem._eventLoopGroup.next()
        let firstSuccessful = eventLoop.makePromise(of: SWIM.PingResponse<SWIMActor, SWIMActor>.self)
        let pingTimeout = directive.timeout
        let peerToPing = directive.target

        let startedSendingPingRequestsSentAt: DispatchTime = .now()
        let pingRequestResponseTimeFirstTimer = self.swim.metrics.shell.pingRequestResponseTimeFirst
        firstSuccessful.futureResult.whenComplete { result in
            switch result {
            case .success: pingRequestResponseTimeFirstTimer.recordInterval(since: startedSendingPingRequestsSentAt)
            case .failure: ()
            }
        }

is prone to hitting Cannot schedule tasks on an EventLoop that has already shut down. from NIO when we hit shutdown() on the system -- which we do a lot in tests.

We should use Swift concurrency for this task and this way won't run into this issue; as we shut down, the tasks will be cancelled and the SWIM will stop its work anyway.