Retry not executing code
JMesens opened this issue · comments
I have run into a problem with the RetryTemplate. I'm building a client for jira (issue tracking) with feign and spring cloud stream. The issues that need to produced are send to the component over http, then the component enqueues the issues (buffering, our on premise jira is very unstable) on a queue using Spring Cloud Stream (Chelsea.SR2) and RabbitMQ binder. The same component also listens to the queue for sending the issues further to jira using feign.
But the problem is that Spring Cloud Stream not always does the retry correctly. (Yes, sometimes it works, not always..) In the wrong flow you can see JiraClient isn't invoked. (Yes we have multiple retries, jira client does retry 5 times, the message should retry for a long time, int max)
Expected flow:
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (26ms)
[ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http)
[ main] b.k.a.SuggestionsApplication : Started SuggestionsApplication in 9.452 seconds (JVM running for 9.873)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy : Sleeping for 1000
[Consumer-1] o.s.retry.support.RetryTemplate : Checking for rethrow: count=1
[Consumer-1] o.s.retry.support.RetryTemplate : Retry: count=1
[Consumer-1] Handler$$EnhancerBySpringCGLIB$$a467b7f4 : Dequeued message
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy : Sleeping for 2000
[Consumer-1] o.s.retry.support.RetryTemplate : Checking for rethrow: count=2
[Consumer-1] o.s.retry.support.RetryTemplate : Retry: count=2
[Consumer-1] Handler$$EnhancerBySpringCGLIB$$a467b7f4 : Dequeued Message
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> RETRYING
[Consumer-1] client.JiraClient : [JiraClient#createIssue] ---> POST https://jira.be/jira/rest/api/2/issue HTTP/1.1
[Consumer-1] client.JiraClient : [JiraClient#createIssue] <--- ERROR UnknownHostException: jira.be (0ms)
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy : Sleeping for 4000
Wrong flow:
[Consumer-1] o.s.retry.support.RetryTemplate : Retry: count=0
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy : Sleeping for 1000
[ main] d.s.w.p.DocumentationPluginsBootstrapper : Found 1 custom documentation plugin(s)
[ main] s.d.s.w.s.ApiListingReferenceScanner : Scanning for api listing references
[ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http)
[ main] b.k.a.SuggestionsApplication : Started SuggestionsApplication in 9.31 seconds (JVM running for 9.737)
[Consumer-1] o.s.retry.support.RetryTemplate : Checking for rethrow: count=1
[Consumer-1] o.s.retry.support.RetryTemplate : Retry: count=1
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy : Sleeping for 2000
[Consumer-1] o.s.retry.support.RetryTemplate : Checking for rethrow: count=2
[Consumer-1] o.s.retry.support.RetryTemplate : Retry: count=2
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy : Sleeping for 4000
[Consumer-1] o.s.retry.support.RetryTemplate : Checking for rethrow: count=3
[Consumer-1] o.s.retry.support.RetryTemplate : Retry: count=3
[Consumer-1] o.s.r.backoff.ExponentialBackOffPolicy : Sleeping for 8000
Code:
@StreamListener(JIRA_QUEUE)
public void handleIssue(Issue issue) {
if (issue != null) {
log.info("Dequeued);
jiraClient.createIssue(new IssueResource(issue));
}
}
Configuration:
spring.cloud.stream.bindings.jiraQueue:
content-type: avro/bytes
group: jiraConsumer
requiredGroups: jiraConsumer
consumer.maxAttempts: 2147483647
Since we don't have something like:
[Consumer-1] Handler$$EnhancerBySpringCGLIB$$a467b7f4 : Dequeued Message
in the wrong flow I only can assume that the target handleIssue(Issue issue)
method isn't selected because of incompatible content-type
(can't be converted via avro/bytes
converter) or your issue
is definitely null
and we even don't step into the execution block because of your if (issue != null) {
logic.
Does it make sense to you?
Maybe you can indeed debug your app when you have that Wrong flow
case?
I tried using application/json and it has the same effect, sometimes it works, sometimes it doesn't.
The null check isn't invoked either.
While debugging I found that the lastException
in the RetryTemplate
has as detailed message of cause: Dispatcher has no subscribers for channel 'unknown.channel.name'.
Ah, this one!
Would you mind sharing more stack trace for that Dispatcher has no subscribers
?
And more config, please. Especially who sends to that JIRA_QUEUE
, but not only Rabbit Binder.
The issue seems for me obsolete, but looks like we still may have some race condition when we have already started ListenerContainer
but still don't have consumer for the JIRA_QUEUE
MessageChannel
.
OK, let's don't speculate! Show, please, more stack trace and we'll see.
It is easy to reproduce: https://github.com/JMesens/asyncHttp
OK. Look. You have this:
public interface IssueQueue {
String JIRA_QUEUE = "issueQueue";
@Input(JIRA_QUEUE)
SubscribableChannel listenIssue();
@Output(JIRA_QUEUE)
MessageChannel pushIssue();
}
And seems for me you confuse the Spring Cloud Stream binding functionality with the same "issueQueue"
for input and output. And in this case the @StreamListener
is subscribed to the pushIssue
MessageChannel
proxy alongside with the SendingHandler
to proceed to the target RabbitMQ destination.
Since we fail to send via Feign we get an exception on the @StreamListener
and according round-robing logic move to the next subscriber - SendingHandler
. The message is enqueued to the RabbitMQ and consumer starts to work. In this case the AmqpInboundChannelAdapter
has the outputChannel
as listenIssue
.But voila! This one doesn't have subscribers because our @StreamListener
is on the pushIssue
.
So, to fix your problem, consider to distinguish Producer and Consumer to different application or use different names for the @Input
and @Output
definitions meanwhile you can definitely bind them to the same target RabbitMQ destination via spring.cloud.stream.bindings.
properties in the application.yml
.
Meanwhile I think this is definitely bug and DispatchingStreamListenerMessageHandler
must definitely subscribe to the binding channel marked with the @Input
.
Well, after closer look, we indeed have to reject such a configuration because we can't register several beans for the same name.
There might be something like target
attribute on the @Input/@Output
though. But indeed we can't just rely on the same name for the target destination and bean names.
@viniciusccarvalho , WDYT?
I think the confusion is how we communicate the fact that the value of the annotation is really the bean name not the destination, and then we use it as default for the destination.
We should revisit this on 2.0 for sure, the fix now is simple just use a different name for the channels and use spring.cloud.stream.bindings.listenIssue.destination=issueQueue
and spring.cloud.stream.bindings.pushIssue.destination=issueQueue
For 2.0 there's a new wrapper type called BindingInformation
that is passed to the binders and contains for now name,contenType
so users can do @Input(value="foo", contentType="application/avro")
, we could add destination to the annotation too, that should make things more explicit
Good. Thank you for confirmation!
So, independently of the fact for the new destination
option on the annotation, we should reject an attempt to register the binding target bean for the same name.
Seems for me easy to fix during BindingService
phase. Even right now for the current 1.3
version.
That isn't good behavior for the 1.2.x
as well, but you may consider it as a breaking change.
More over we have a workaround, although it isn't so obvious what is the problem at a glance...
@artembilan Thank you for handling this issue so thorough, it was very helpful. I implemented the bindings work-around and it works. (more info: https://github.com/JMesens/asyncHttp/commit/e8e5f4bb4536caaf7b60e8423bd32742aece5319)
Thanks a lot!