Spring gateway use reactor netty switch thread spend too many time

Question

Spring gateway use reactor netty switch thread spend too many time

will-zdu opened this issue 6 months ago · comments

zdu commented 6 months ago

Expected Behavior

thread swtich quickly

Actual Behavior

thread swtich spend too many time

Steps to Reproduce

use spring cloud gateway org.springframework.cloud.gateway.filter.NettyRoutingFilter#filter
spring gateway:2.2.6.RELEASE
reactor netty:0.9.10

@Override
	public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
		URI requestUrl = exchange.getRequiredAttribute(GATEWAY_REQUEST_URL_ATTR);

                
                logger.info("log1")
		return this.httpClient.request(method, url, req -> {
			final HttpClientRequest proxyRequest = req.options(NettyPipeline.SendOptions::flushOnEach)
					.headers(httpHeaders)
					.chunkedTransfer(chunkedTransfer)
					.failOnServerError(false)
					.failOnClientError(false);

			if (preserveHost) {
				String host = request.getHeaders().getFirst(HttpHeaders.HOST);
				proxyRequest.header(HttpHeaders.HOST, host);
			}

                        logger.info("log2")
			return proxyRequest.sendHeaders() //I shouldn't need this
					.send(request.getBody().map(dataBuffer ->
							((NettyDataBuffer)dataBuffer).getNativeBuffer()));
		}).doOnNext(res -> {
			ServerHttpResponse response = exchange.getResponse();
			// put headers and status so filters can modify the response
			HttpHeaders headers = new HttpHeaders();

			res.responseHeaders().forEach(entry -> headers.add(entry.getKey(), entry.getValue()));

			exchange.getAttributes().put("original_response_content_type", headers.getContentType());

			HttpHeaders filteredResponseHeaders = HttpHeadersFilter.filter(
					this.headersFilters.getIfAvailable(), headers, exchange, Type.RESPONSE);
			
			response.getHeaders().putAll(filteredResponseHeaders);
			HttpStatus status = HttpStatus.resolve(res.status().code());
			if (status != null) {
				response.setStatusCode(status);
			} else if (response instanceof AbstractServerHttpResponse) {
				// https://jira.spring.io/browse/SPR-16748
				((AbstractServerHttpResponse) response).setStatusCodeValue(res.status().code());
			} else {
				throw new IllegalStateException("Unable to set status code on response: " +res.status().code()+", "+response.getClass());
			}

			// Defer committing the response until all route filters have run
			// Put client response as ServerWebExchange attribute and write response later NettyWriteResponseFilter
			exchange.getAttributes().put(CLIENT_RESPONSE_ATTR, res);
		}).then(chain.filter(exchange));
	}

log1 with log2 spend too many time ,actually use 6 seconds
check the cpu,loadaverage,gc,safepoint,is works well,and the other request is still working well

Possible Solution

Your Environment

Reactor version(s) used: 0.9.10
Other relevant libraries versions (eg. netty, ...):
JVM version (java -version):
OS and version (eg. uname -a):

Violeta Georgieva · Answer 1 · Mon Jan 08 2024 20:51:51 GMT+0800 (China Standard Time)

@will-zdu You are using an unsupported version, also there is a known regression in 0.9.10 that is fixed in 0.9.14. I would strongly encourage you to update your versions.

https://github.com/reactor/reactor-netty/releases/tag/v0.9.14.RELEASE

zdu · Answer 2 · Wed Jan 10 2024 13:45:59 GMT+0800 (China Standard Time)

@violetagg #1371
I have seen your modification, but I still don't understand the root cause of this problem. How does subsrcibe not executing in eventloop cause race condition with request, which leads to long execution time,Could you please help me explain, or provide key relevant information, or I need to read that part of the relevant document
In my understanding, this may be the cause of the problem,
reactor.netty.channel.FluxReceive#drainReceiver
In this method,
If "receiver" is empty, then keep checking the loop until "receiver" is not empty and it works, but I still haven't found the reason why most of them are about 6 seconds out. Does it have something to do with that configuration?
If the request call is executed before subscribe, and subscribe is not in eventloop, is it because receiver is empty, but the eventloop thread keeps looping, and receiver is not volatile, resulting in eventloop The reason for cache consistency in the thread cannot be found even if the receiver is no longer empty

Violeta Georgieva · Answer 3 · Wed Jan 10 2024 15:48:09 GMT+0800 (China Standard Time)

@will-zdu Did you update at least to the latest available 0.9.x release?

zdu · Answer 4 · Wed Jan 10 2024 16:13:54 GMT+0800 (China Standard Time)

@violetagg have changed to use 0.9.14.release,but stil found timeout,but when restart it and check it again,no timeout found

Violeta Georgieva · Answer 5 · Wed Jan 10 2024 20:00:20 GMT+0800 (China Standard Time)

@will-zdu If 0.9.14 doesn't solve the issue ... I can only recommend to upgrade to a supported version and if the problem still exists, to provide some reproducible example.

zdu · Answer 6 · Thu Jan 11 2024 10:18:25 GMT+0800 (China Standard Time)

@violetagg 0.9.14 reproduce again in product env,now we try to use the latest available 0.9.x release

Violeta Georgieva · Answer 7 · Mon Jan 22 2024 14:01:43 GMT+0800 (China Standard Time)

@will-zdu I'm closing this one. Please upgrade to a supported version and if the problem still exists, provide some reproducible example.