spring-projects / spring-amqp

Spring AMQP - support for Spring programming model with AMQP, especially but not limited to RabbitMQ

Home Page:https://spring.io/projects/spring-amqp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consumer won't reconnect to rabbitmq cluster probably / main thread won't die

Loki-Afro opened this issue · comments

In what version(s) of Spring AMQP are you seeing this issue?

2.7.18

Describe the bug

main thread won't exit when single rabbitmq consumer has stopped running after a network interruption.
appears only to be happening with a rabbitmq cluster

To Reproduce

I do not know the exact circumstances, but i can assist in debugging
setup is in k8s so ..

  1. setup 3 - node rabbitmq cluster, set up using the rabbitmq operator, this also creates a rabbitmq k8s service
  2. create a consumer app
    @Bean("mailConsumerContainer")
    SimpleMessageListenerContainer container(MailDropConfiguration mailDropConfiguration, ConnectionFactory connectionFactory,
                                             MailConsumer mailConsumer) {
        SimpleMessageListenerContainer container = new SimpleMessageListenerContainer();
        container.setConnectionFactory(connectionFactory);
        container.setQueueNames(mailDropConfiguration.getQueue());
        container.setMessageListener(mailConsumer);
        return container;
    }

where MailConsumer implements ChannelAwareMessageListener
3. configure the spring boot app, to connect to the rabbitmq service using the SPRING_RABBITMQ_HOST env variable
4. restart the cluster, by that i mean restarting the k8s stateful set, so that each node stops, and starts again
5. at one point the app looses connection to the rabbitmq cluster and is able to connect again, but the consumer won't come up again

the last log messages are

│ 2023-12-11 12:57:25.801 ERROR 1 --- [    container-2] o.s.a.r.l.SimpleMessageListenerContainer : Stopping container from aborted consumer                                                                                                                                                                                 │
│ 2023-12-11 12:57:25.802  INFO 1 --- [    container-2] o.s.a.r.l.SimpleMessageListenerContainer : Waiting for workers to finish.                                                                                                                                                                                           │
│ 2023-12-11 12:57:25.803  INFO 1 --- [    container-2] o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.

Expected behavior

main thread exits / consumer is consuming again

A clear and concise description of what you expected to happen.

Sample

this is almost as minimal as it could get https://github.com/kaffeekrone/mail-drop

to circumvent that bug from happening i wrote an health indicator to be used as a liveness probe, but i consider this as a workaround, it is also included in the repo

@Component("consumer")
public class MailConsumerContainerHealthIndicator extends AbstractHealthIndicator {

    private final SimpleMessageListenerContainer simpleMessageListenerContainer;

    public MailConsumerContainerHealthIndicator(@Qualifier("mailConsumerContainer") SimpleMessageListenerContainer simpleMessageListenerContainer) {
        this.simpleMessageListenerContainer = simpleMessageListenerContainer;
    }

    @Override
    protected void doHealthCheck(Health.Builder builder) {
        boolean isRunning = simpleMessageListenerContainer.isRunning();
        if (isRunning) {
            builder.up()
                    .withDetail("isRunning", isRunning)
                    .withDetail("isActive", simpleMessageListenerContainer.isActive())
                    .withDetail("activeConsumerCount", simpleMessageListenerContainer.getActiveConsumerCount());
        } else {
            builder.down()
                    .withDetail("isRunning", isRunning)
                    .withDetail("isActive", simpleMessageListenerContainer.isActive());
        }
    }
}

Additional Question/Suggestion
is SPRING_RABBITMQ_HOST the correct way to configure that? there is also SPRING_RABBITMQ_ADDRESSES where one would supply a list of nodes, but since this is k8s i want to make use of the service discovery feature

That Spring Boot version is out of Open Source support: https://spring.io/projects/spring-boot#support.

Please, consider to upgrade to the latest one: probably there were some fixes in between to mitigate the problem.

On the other hand, if it fails on Kubernetes, I believe it may fail just with the local environment where we would be able to reproduce.
Another thought: if it fails with Spring AMQP, probably it would fail with the plain RabbitMQ Client library.
That connection info is just propagated from Spring AMQP down to the client.

hey @artembilan according to the website you posted it is not our of commercial support :P 2.7.x "2025-08-24"

but anyway, i prepared everything for a spring upgrade, will test tomorrow with the latest spring boot version locally with a port forward to the cluster

Yeah... Sorry. That Spring Boot version is out of Open Source support.
If you have a commercial support, then follow respective channels, but not this GitHub issues.

update to spring 3.2 seems to fix that issue too on first sight

...
2023-12-13T11:57:45.221+01:00  INFO 1811479 --- [erContainer-113] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: [localhost:6672]
2023-12-13T11:57:50.228+01:00  WARN 1811479 --- [erContainer-113] o.s.a.r.l.SimpleMessageListenerContainer : Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused
2023-12-13T11:57:50.228+01:00  INFO 1811479 --- [erContainer-113] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@a749d1a: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
...
# consumer happy

will close this issue for now and continue testing, thx @artembilan for pushing me