resilience4j / resilience4j

Resilience4j is a fault tolerance library designed for Java8 and functional programming

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reactor Circuit Breaker stuck in HALF_OPEN state

adam-kw opened this issue · comments

commented

Hi we are using <artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId> version 2.1.1. We didn't encountered any problems with it up until now. Through some debugging I found out that circuit breaker can't change from HALF_OPEN to CLOSED state and can't recover from this even though all we are getting are 200 responses (almost - pt.9)

config:
        failureRateThreshold: 50
        waitDurationInOpenStateInMilis: 10000
        permittedNumberOfCallsInHalfOpenState: 10
        minimumNumberOfCalls: 50
        slidingWindowSize: 100
        slowCallRateThreshold: 50
        slowCallDurationThresholdInMillis: 10000

*similar to our code

getAllAnimalDetails() {
	return getAllAnimals() //webClient call, returns Mono<List<Animal>>
		.flatMap(animals -> invokeGetAnimalsDetails(animals)) //for every animal make webClient call for its details
		.map(mapper::mapToResponse);
}

What happens in steps:
*cb configured to monitor getAnimalDetails requests and getAllAnimals (2 separate cb's)
*getAllAnimals return 4 results, this means that we will make 4 getAnimalDetails calls per controller call, of course only in this test scenario

  1. OPEN STATE
  2. waits 10s
  3. HALF_OPEN state
  4. call to getAllAnimals (returns 4 animals)
  5. 4 calls for details (200 response)
  6. call to getAllAnimals (returns 4 animals)
  7. 4 calls for details (200 response)
  8. call to getAllAnimals (returns 4 animals)
  9. 2 calls for details (only request logged)
  10. CircuitBreaker 'invokeGetAnimalDetails' recorded a call which was not permitted.
  11. 503 response
  12. Every following request will end up with 503 response (logged with event like point 10) and CB state is still HALF_OPEN

It looks like CB is interrupting 2 out of 4 calls because its allowed by configuration to permit only 10 calls. Now why it doesn't work like this:

  1. call to getAllAnimals (returns 4 animals)
  2. 2 calls for details (200 response)
  3. CLOSED state
  4. 2 calls for details (200 response)
  5. 200 response
commented

The issue is how we handle http calls for details in single flatMap. Refactoring the "getAllAnimals()" to return Flux and then call for details in concatMap() resolves this issue, steps are exactly like I wrote at the end of previous post.