spring-cloud / spring-cloud-stream

Framework for building Event-Driven Microservices

Home Page:http://cloud.spring.io/spring-cloud-stream

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Retry not executing code

JMesens opened this issue · comments

I have run into a problem with the RetryTemplate. I'm building a client for jira (issue tracking) with feign and spring cloud stream. The issues that need to produced are send to the component over http, then the component enqueues the issues (buffering, our on premise jira is very unstable) on a queue using Spring Cloud Stream (Chelsea.SR2) and RabbitMQ binder. The same component also listens to the queue for sending the issues further to jira using feign.
But the problem is that Spring Cloud Stream not always does the retry correctly. (Yes, sometimes it works, not always..) In the wrong flow you can see JiraClient isn't invoked. (Yes we have multiple retries, jira client does retry 5 times, the message should retry for a long time, int max)

Expected flow:

[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (26ms)
[      main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http)
[      main] b.k.a.SuggestionsApplication             : Started SuggestionsApplication in 9.452 seconds (JVM running for 9.873)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy   : Sleeping for 1000
[Consumer-1] o.s.retry.support.RetryTemplate          : Checking for rethrow: count=1
[Consumer-1] o.s.retry.support.RetryTemplate          : Retry: count=1
[Consumer-1] Handler$$EnhancerBySpringCGLIB$$a467b7f4 : Dequeued message
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy   : Sleeping for 2000
[Consumer-1] o.s.retry.support.RetryTemplate          : Checking for rethrow: count=2
[Consumer-1] o.s.retry.support.RetryTemplate          : Retry: count=2
[Consumer-1] Handler$$EnhancerBySpringCGLIB$$a467b7f4 : Dequeued Message
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient      : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy   : Sleeping for 4000

Wrong flow:

[Consumer-1] o.s.retry.support.RetryTemplate          : Retry: count=0
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy   : Sleeping for 1000
[      main] d.s.w.p.DocumentationPluginsBootstrapper : Found 1 custom documentation plugin(s)
[      main] s.d.s.w.s.ApiListingReferenceScanner     : Scanning for api listing references
[      main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http)
[      main] b.k.a.SuggestionsApplication             : Started SuggestionsApplication in 9.31 seconds (JVM running for 9.737)
[Consumer-1] o.s.retry.support.RetryTemplate          : Checking for rethrow: count=1
[Consumer-1] o.s.retry.support.RetryTemplate          : Retry: count=1
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy   : Sleeping for 2000
[Consumer-1] o.s.retry.support.RetryTemplate          : Checking for rethrow: count=2
[Consumer-1] o.s.retry.support.RetryTemplate          : Retry: count=2
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy   : Sleeping for 4000
[Consumer-1] o.s.retry.support.RetryTemplate          : Checking for rethrow: count=3
[Consumer-1] o.s.retry.support.RetryTemplate          : Retry: count=3
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy   : Sleeping for 8000

Code:

    @StreamListener(JIRA_QUEUE)
    public void handleIssue(Issue issue) {
        if (issue != null) {
            log.info("Dequeued);
            jiraClient.createIssue(new IssueResource(issue));
        }
    }

Configuration:

spring.cloud.stream.bindings.jiraQueue:
  content-type: avro/bytes
  group: jiraConsumer
  requiredGroups: jiraConsumer
  consumer.maxAttempts: 2147483647

Since we don't have something like:

[Consumer-1] Handler$$EnhancerBySpringCGLIB$$a467b7f4 : Dequeued Message

in the wrong flow I only can assume that the target handleIssue(Issue issue) method isn't selected because of incompatible content-type (can't be converted via avro/bytes converter) or your issue is definitely null and we even don't step into the execution block because of your if (issue != null) { logic.

Does it make sense to you?

Maybe you can indeed debug your app when you have that Wrong flow case?

I tried using application/json and it has the same effect, sometimes it works, sometimes it doesn't.
The null check isn't invoked either.
While debugging I found that the lastException in the RetryTemplate has as detailed message of cause: Dispatcher has no subscribers for channel 'unknown.channel.name'.

Ah, this one!

Would you mind sharing more stack trace for that Dispatcher has no subscribers?
And more config, please. Especially who sends to that JIRA_QUEUE, but not only Rabbit Binder.

The issue seems for me obsolete, but looks like we still may have some race condition when we have already started ListenerContainer but still don't have consumer for the JIRA_QUEUE MessageChannel.

OK, let's don't speculate! Show, please, more stack trace and we'll see.

It is easy to reproduce: https://github.com/JMesens/asyncHttp

OK. Look. You have this:

public interface IssueQueue {

    String JIRA_QUEUE = "issueQueue";

    @Input(JIRA_QUEUE)
    SubscribableChannel listenIssue();

    @Output(JIRA_QUEUE)
    MessageChannel pushIssue();
}

And seems for me you confuse the Spring Cloud Stream binding functionality with the same "issueQueue" for input and output. And in this case the @StreamListener is subscribed to the pushIssue MessageChannel proxy alongside with the SendingHandler to proceed to the target RabbitMQ destination.

Since we fail to send via Feign we get an exception on the @StreamListener and according round-robing logic move to the next subscriber - SendingHandler. The message is enqueued to the RabbitMQ and consumer starts to work. In this case the AmqpInboundChannelAdapter has the outputChannel as listenIssue .But voila! This one doesn't have subscribers because our @StreamListener is on the pushIssue.

So, to fix your problem, consider to distinguish Producer and Consumer to different application or use different names for the @Input and @Output definitions meanwhile you can definitely bind them to the same target RabbitMQ destination via spring.cloud.stream.bindings. properties in the application.yml.

Meanwhile I think this is definitely bug and DispatchingStreamListenerMessageHandler must definitely subscribe to the binding channel marked with the @Input.

Well, after closer look, we indeed have to reject such a configuration because we can't register several beans for the same name.

There might be something like target attribute on the @Input/@Output though. But indeed we can't just rely on the same name for the target destination and bean names.

@viniciusccarvalho , WDYT?

I think the confusion is how we communicate the fact that the value of the annotation is really the bean name not the destination, and then we use it as default for the destination.

We should revisit this on 2.0 for sure, the fix now is simple just use a different name for the channels and use spring.cloud.stream.bindings.listenIssue.destination=issueQueue and spring.cloud.stream.bindings.pushIssue.destination=issueQueue

For 2.0 there's a new wrapper type called BindingInformation that is passed to the binders and contains for now name,contenType so users can do @Input(value="foo", contentType="application/avro"), we could add destination to the annotation too, that should make things more explicit

Good. Thank you for confirmation!

So, independently of the fact for the new destination option on the annotation, we should reject an attempt to register the binding target bean for the same name.
Seems for me easy to fix during BindingService phase. Even right now for the current 1.3 version.
That isn't good behavior for the 1.2.x as well, but you may consider it as a breaking change.
More over we have a workaround, although it isn't so obvious what is the problem at a glance...

@artembilan Thank you for handling this issue so thorough, it was very helpful. I implemented the bindings work-around and it works. (more info: https://github.com/JMesens/asyncHttp/commit/e8e5f4bb4536caaf7b60e8423bd32742aece5319)
Thanks a lot!