openresty / openresty

High Performance Web Platform Based on Nginx and LuaJIT

Home Page:https://openresty.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When a large number of requests are time-consuming, they affect each other, causing 499

funky-eyes opened this issue · comments

When a request has a large number of high rt, some normal requests will not respond normally, resulting in 499. When the relevant high rt application solves the problem of high rt, 499 will disappear
I don't know if it has anything to do with the version of openrestry I use.
openresty/1.13.6.2
centos:7.9.2009
kernel:3.10.0-1160.81.1.e17.x86_64
ulimit openfiles: 1000000
image

HTTP 499 in Nginx means that the client closed the connection before the server answered the request. In my experience is usually caused by client-side timeout. So the problem seems your upstream server does not respond anything when HTTP client timeout occurs.

a request has a large number of high rt
What exactly does the high rt mean?
Does it mean the request will block the nginx cycle and nginx cannot serve other requests?
Or does it mean that the upstream of the high rt request response slowly?

If it is the latter, then it is abnormal.
If it is the first, you can try to yield in the high rt request or don't use the blocking api.

I have multiple upstream services. For example, when the qps of my service A reaches 230 and rt30ms, service B will be affected by service A, resulting in 499 rt jitter. When I migrate the A service to another openresty cluster, the problem is solved immediately. The following is the monitoring chart at that time. The cpu, io, and load are not abnormal. The openresty cluster is very healthy.
image
image
image

You can use OpenResty XRay to analyze this issue.
Please goto https://xray.openresty.com

I will try to use it to analyze it, thanks for the reply