redis / redis

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.

Home Page:http://redis.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]sentinel failed to failover while master was down.[4.0.14]

luijianfie opened this issue · comments

Describe the bug

topology:
10.250.17.68:6379 master
10.250.17.80:6379 slave
10.250.17.68:6380 sentinel
10.250.17.80:6380 sentinel
10.250.17.32:6380 sentinel

all hosts are virtual machines. host 10.250.17.68 need to be shutdown for maintainance purpose.

master 10.250.17.68:6379 exited
[10.250.17.68-redis-6379]3894:signal-handler (1713937702) Received SIGTERM scheduling shutdown...
[10.250.17.68-redis-6379]3894:M 24 Apr 13:48:22.373 # User requested shutdown...
[10.250.17.68-redis-6379]3894:M 24 Apr 13:48:22.373 * Calling fsync() on the AOF file.
[10.250.17.68-redis-6379]3894:M 24 Apr 13:48:22.373 # Redis is now ready to exit, bye bye...

sentinel 10.250.17.68:6379 exited
[10.250.17.68-redis-seintiel-6380]3980:signal-handler (1713937702) Received SIGTERM scheduling shutdown...
[10.250.17.68-redis-seintiel-6380]3980:X 24 Apr 13:48:22.379 # User requested shutdown...
[10.250.17.68-redis-seintiel-6380]3980:X 24 Apr 13:48:22.379 # Sentinel is now ready to exit, bye bye...

sentinel 10.250.17.32:6380 marked master as sdown at 13:48:42.51
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:48:42.518 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:48:42.518 # +sdown sentinel 723cccee8b0adf35a3669c17f698ab8e4968c46a 10.250.17.68 6380 @ sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.32-redis-sentinel]
[10.250.17.32-redis-sentinel]
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:53:42.571 # +new-epoch 1
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:53:42.572 # +vote-for-leader 7c484ccb1655219c36b98d86887fcbdf29ede55f 1
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:53:43.513 # +odown master sentinel-10.250.17.68-6379 10.250.17.68 6379 #quorum 2/2

sentinel 10.250.17.80:6380 repeatedly marked master +sdown and -sdown, and marked master as odown at 13:53:42.567
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:22.533 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:23.469 # -sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:23.527 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:24.500 # -sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:42.504 # +sdown sentinel 723cccee8b0adf35a3669c17f698ab8e4968c46a 10.250.17.68 6380 @ sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:53:42.505 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:53:42.567 # +odown master sentinel-10.250.17.68-6379 10.250.17.68 6379 #quorum 2/2

questions:
1.why did 10.250.17.80:6380 first mark master +sdown at 13:48:22.533 while down-after-milliseconds is 20000? Master exited at 13:48:22.373. The earliest time for 10.250.17.80:6380 to mark master as down should be 13:48:42.373?

2.why did 10.250.17.80:6380 repeatedly mark master +sdown and -sdown?

3.Why did 10.250.17.80:6380 only confirm the master as odown at 13:53:42.567? By this time, it had been 5 minutes since the master went down. Host 10.250.17.68 completed its restart at 13:53:42.

To reproduce

failed to reproduce

Expected behavior

to failover successfully

Additional information

Any additional information that is relevant to the problem.