[Bug] broker healthcheck ran into loop after decommissioned a cluster of bookies

Question

[Bug] broker healthcheck ran into loop after decommissioned a cluster of bookies

wallacepeng opened this issue a month ago · comments

Wallace Peng commented a month ago

Search before asking

I searched in the issues and found nothing similar.

Read release policy

I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

puslar 2.10.5

Minimal reproduce step

setup two bookkeeper clusters using helm charts
bookkeeper and bookkeeper1
make bookkeeper as readonly
decommission bookkeeper till zero replica (as we are using kubernetes, scale down one node, autorecovery replicates the ledgers)
restart brokers.
broker ran into loop on health check

What did you expect to see?

broker health check should continue to work

What did you see instead?

broker health check ran into loop

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

hpvd · Answer 1 · Tue Apr 23 2024 15:53:49 GMT+0800 (China Standard Time)

since 2.10 is not supported anymore, can you plz check if this also appears in newer versions?
For details see

supported versions: https://pulsar.apache.org/contribute/release-policy/
latest pulsar helm chart: https://github.com/apache/pulsar-helm-chart/releases

Wallace Peng · Answer 2 · Tue Apr 23 2024 22:03:21 GMT+0800 (China Standard Time)

@hpvd we are downgrading the storage so provisioned the bookkeeper cluster. we will upgrade the cluster a bit later. is there any way to clean the healthcheck topic it looks like the ledger cached old bookies ?

Wallace Peng · Answer 3 · Fri Apr 26 2024 10:54:32 GMT+0800 (China Standard Time)

I finally fixed it . I have to set up another broker cluster, then did some clean up for namespace and managed-ledgers , schemas , then restore the old broker cluster, it fixed the issue .