mercurius-js / mercurius-gateway

Mercurius federation support plugin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Load balancer must check if the server in the provided pool is online before sending a request

SiNONiMiTY opened this issue · comments

Title says

I am encountering a scenario where I provide 2 URLs for a single subgraph in an array form

const gateway = Fastify()
gateway.register(mercuriusGateway, {
    gateway: {
        services: [
            {
                "name": "user",
                "url": [
                        "http://endpoint1:4001/graphql",
                        "http://endpoint2:4001/graphql"
                ],
                "schema": "type Query { id: ID }"
            }
        ]
    }
})

endpoint2 is intentionally taken down and only endpoint1 is working, however,
when sending queries on the gateway, I am occassionally receiving errors about ECONNREFUSED on endpoint2.

The load balancing mechanism should first do a test ping if the host is reachable before sending a request.

Unfortunately it's a bit more complex than sending a "ping", as those errors come from existing sockets that are truncated.

How are you shutting down your upstreams servers? Are they closing gracefully or are they crashing?

Unfortunately it's a bit more complex than sending a "ping", as those errors come from existing sockets that are truncated.

How are you shutting down your upstreams servers? Are they closing gracefully or are they crashing?

Starting the gateway with only one online subgraph out of the two provided

Thanks, that helps!

I think there is a bug in undici BalancedPool that routes requests to an upstream even if it could not connect there, and it does not retry/send it elsewhere in case it fails to connect. Things stabilizes over time because of BalancedPool algorithm, so only a few number of requests would fail.

The bad news is that I don't have time right now to fix it there.

Thanks, that helps!

I think there is a bug in undici BalancedPool that routes requests to an upstream even if it could not connect there, and it does not retry/send it elsewhere in case it fails to connect. Things stabilizes over time because of BalancedPool algorithm, so only a few number of requests would fail.

The bad news is that I don't have time right now to fix it there.

Yes! I noticed that the balancing algorithm eventually only selects the online server after sending some requests.