ipni / storetheindex

A directory of CIDs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

100% latency increase on routing/v1 API

guseggert opened this issue · comments

Something I noticed this morning from Hydra metrics: at around 2022-12-16T10:36:00Z, the cid.contact/routing/v1 endpoint experienced a sudden 100% increase in latency, p50 went from ~30ms to ~60ms. This doesn't correlate with any increase in request rate to Hydras or other client-side metrics, so it seems likely to be a server-side issue. Perhaps there was a deployment around that time?

Graph:
Screenshot_20221216_090300

(source)

Assuming times are in UTC?

The UTC time is in the message (2022-12-16T10:36:00Z), the image is in my local tz (UTC-5). Sorry for that confusion.

Thanks Gus; looking

Fixed graph to use UTC tz

Origin latency at CloudFront, and upstream latency at nginx ingress controller both look flat. There has not been any deployments today.

This leaves me to believe there is either something we are not measuring at cid.contact or the root cause is outside of cid.contact. I am trying to think if there is an alternative source via which we could confirm the latency increase; do gateways observe the same latency increase for example (even though they are not using the new HTTP delegated routing endpoint)? Under the hood the requests are translated and hit the same execution path down the line.

image

image

It's certainly possible that this is caused by something internal to the Hydras, causing them to write the req and read the response more slowly, but those are usually gradual degradations or align with some other metric like request rate.

Also I do not see this same spike from prod Hydras which are still running reframe.

Cannot reproduce (no more hydras), and no follow-up on this specific issue.