When a large number of requests are time-consuming, they affect each other, causing 499
funky-eyes opened this issue · comments
When a request has a large number of high rt, some normal requests will not respond normally, resulting in 499. When the relevant high rt application solves the problem of high rt, 499 will disappear
I don't know if it has anything to do with the version of openrestry I use.
openresty/1.13.6.2
centos:7.9.2009
kernel:3.10.0-1160.81.1.e17.x86_64
ulimit openfiles: 1000000
HTTP 499 in Nginx means that the client closed the connection before the server answered the request. In my experience is usually caused by client-side timeout. So the problem seems your upstream server does not respond anything when HTTP client timeout occurs.
a request has a large number of high rt
What exactly does the high rt mean?
Does it mean the request will block the nginx cycle and nginx cannot serve other requests?
Or does it mean that the upstream of the high rt request response slowly?
If it is the latter, then it is abnormal.
If it is the first, you can try to yield in the high rt request or don't use the blocking api.
I have multiple upstream services. For example, when the qps of my service A reaches 230 and rt30ms, service B will be affected by service A, resulting in 499 rt jitter. When I migrate the A service to another openresty cluster, the problem is solved immediately. The following is the monitoring chart at that time. The cpu, io, and load are not abnormal. The openresty cluster is very healthy.
You can use OpenResty XRay to analyze this issue.
Please goto https://xray.openresty.com
I will try to use it to analyze it, thanks for the reply