acl-dev / acl

C/C++ server and network library, including coroutine,redis client,http/https/websocket,mqtt, mysql/postgresql/sqlite client with C/C++ for Linux, Android, iOS, MacOS, Windows, etc..

Home Page:https://acl-dev.cn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

对加入的新节点分配slot失败

man-laughing opened this issue · comments

系统环境: CentOS 7.6
软件环境: Redis-3.2.12

redis_builder 安装步骤以下:
$cd lib_acl; make
$cd lib_protocol; make
$cd lib_acl_cpp; make
$cd app/redis_tools/redis_builder; make

我的场景:目前是有三主三从的Cluster集群,正常工作
期望的场景: 新增一主一从加入集群工作

补充:这里集群内所有节点都是带有密码认证的,且密码和新加入节点密码一致

报错的命令:对新加入的主节点(10.116.168.85:9001)分配slot,命令如下
[root@xxxxxx redis_builder]# ./redis_builder -s 10.116.168.85:9001 -p abc -a reshard

addr: 127.0.0.1:9001
id: 282b7679e7266e5e727eca597791e3cba84f1bda
slots: 0 - 0

addr: 10.116.168.85:8003
id: 3a2f85b9b5e4a16d498ede66814e1f42b5f1a099
slots: 10923 - 16383

addr: 10.116.168.85:8001
id: 43f8323c15f8f6ba3d3fad21126ce7af7018663f
slots: 0 - 5461

addr: 10.116.168.85:8005
id: f40e810374720e76ffc7b1b71c4cd1615f7dd00d
slots: 5462 - 10922
How many slots do you want to move (from 1 to 16384) ? 4000
What is the receiving node ID? 282b7679e7266e5e727eca597791e3cba84f1bda
Please input all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots
Type 'done' once you entered all the source node IDs.
Source node #1: all
Moving slot 10923 from 10.116.168.85|8003 to 127.0.0.1|9001: move key: keyytest891 error: Success, from: 10.116.168.85|8003, to: 127.0.0.1|9001
move key: keyytest891 error, from: 10.116.168.85|8003, to: 127.0.0.1|9001
move slots error, slot: 10923
move failed, stop!
move over!

我下载了redis3.2.12版本,在本地搭了环境,测试节点加入及 slot 迁移是可以的,我把我的操作过程描述一下:
1、启动9个redis-server实例,分别监听 9101 -- 9109,创建 nodes.xml 文件如下:

然后运行命令: $./redis-builder -f ./nodes.xml -a create -r 2 将前6个节点组成一个集群:2主4备(即每个节点的副本为2),运行: $./redis-builder -s 127.0.0.1:9101 -a nodes 显示如下:
machine: 127.0.0.1
|--- master: 127.0.0.1:9101, id: 912b6ffb529a4fd8c0ce2164bcbbd1bae7e20f04, slots: [0-8191]
        |--- slave: 127.0.0.1:9105, id: c7d85e56c0e469f66f42140d8f7c80b55c895e0c
        |--- slave: 127.0.0.1:9106, id: 90135a9d430432bcc53f8c3e7f2ba3ab632eb81a
|--- master: 127.0.0.1:9104, id: b0f14439b19b9559959fc76e9381c0b7383ae605, slots: [8192-16383]
        |--- slave: 127.0.0.1:9102, id: 7cd8ebc51bf2c9c1ebb1f70f89be6e1e46abda18
        |--- slave: 127.0.0.1:9103, id: f9a7e6cbb6eb0264ca8a6b847b9f05974ad1ef29

2、添加新的节点:
给上述redis集群添加一个主节点:
$./redis-builder -s 127.0.0.1:9101 -a add_node -N 127.0.0.1:9107
再给 9107 节点添加两个从节点:
$./redis-builder -s 127.0.0.1:9107 -a add_node -N 127.0.0.1:9108 -S
$./redis-builder -s 127.0.0.1:9107 -a add_node -N 127.0.0.1:9109 -S
显示新集群的节点信息:
$ ./redis_builder -s 127.0.0.1:9107 -a nodes

machine: 127.0.0.1
|--- master: 127.0.0.1:9101, id: 912b6ffb529a4fd8c0ce2164bcbbd1bae7e20f04, slots: [0-8191]
        |--- slave: 127.0.0.1:9106, id: 90135a9d430432bcc53f8c3e7f2ba3ab632eb81a
        |--- slave: 127.0.0.1:9105, id: c7d85e56c0e469f66f42140d8f7c80b55c895e0c
|--- master: 127.0.0.1:9104, id: b0f14439b19b9559959fc76e9381c0b7383ae605, slots: [8192-16383]
        |--- slave: 127.0.0.1:9103, id: f9a7e6cbb6eb0264ca8a6b847b9f05974ad1ef29
        |--- slave: 127.0.0.1:9102, id: 7cd8ebc51bf2c9c1ebb1f70f89be6e1e46abda18
|--- master: 127.0.0.1:9107, id: febd0867355883d18ded4561442f3eedbbb1de32, slots:
        |--- slave: 127.0.0.1:9109, id: 3035fd4502660c0b745b83e85bae3e34984a0652
        |--- slave: 127.0.0.1:9108, id: 35d216832c128edab779f18d0275a8fd30f5543c

可以看到9107主节点上还没有slot槽,运行下面命令开始迁移:

$./redis_builder -s 127.0.0.1:9101 -a reshard
提示如下:

addr: 127.0.0.1:9101
id: 912b6ffb529a4fd8c0ce2164bcbbd1bae7e20f04
slots: 0 - 8191
-----------------------------------------------
addr: 127.0.0.1:9104
id: b0f14439b19b9559959fc76e9381c0b7383ae605
slots: 8192 - 16383
-----------------------------------------------
addr: 127.0.0.1:9107
id: febd0867355883d18ded4561442f3eedbbb1de32
How many slots do you want to move (from 1 to 16384) ? 5000  <-- 此处表示想要迁移5000个slot
What is the receiving node ID? febd0867355883d18ded4561442f3eedbbb1de32 <-- 此 ID 代表了 9107 节点
Please input all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots
  Type 'done' once you entered all the source node IDs.
Source node #1: all

然后开始迁移slot,显示如下:
。。。。
Notify all: slot 10691, moved to febd0867355883d18ded4561442f3eedbbb1de32 ok
moved 5000 slots ok
move over!

3、显示迁移后slot分布:
$ ./redis_builder -s 127.0.0.1:9101 -a nodes

machine: 127.0.0.1
|--- master: 127.0.0.1:9101, id: 912b6ffb529a4fd8c0ce2164bcbbd1bae7e20f04, slots: [2500-8191]
        |--- slave: 127.0.0.1:9105, id: c7d85e56c0e469f66f42140d8f7c80b55c895e0c
        |--- slave: 127.0.0.1:9106, id: 90135a9d430432bcc53f8c3e7f2ba3ab632eb81a
|--- master: 127.0.0.1:9104, id: b0f14439b19b9559959fc76e9381c0b7383ae605, slots: [10692-16383]
        |--- slave: 127.0.0.1:9102, id: 7cd8ebc51bf2c9c1ebb1f70f89be6e1e46abda18
        |--- slave: 127.0.0.1:9103, id: f9a7e6cbb6eb0264ca8a6b847b9f05974ad1ef29
|--- master: 127.0.0.1:9107, id: febd0867355883d18ded4561442f3eedbbb1de32, slots: [0-2499] [8192-10691]
        |--- slave: 127.0.0.1:9108, id: 35d216832c128edab779f18d0275a8fd30f5543c
        |--- slave: 127.0.0.1:9109, id: 3035fd4502660c0b745b83e85bae3e34984a0652

可以看到新节点 9107 上已经有5000个 slot 了。

@zhengshuxin 忘记说了,我的集群是带有密码认证的,新加节点也是带有认证的

@zhengshuxin 忘记说了,我的集群是带有密码认证的,新加节点也是带有认证的

我加上密码认证也是可以的。

@zhengshuxin 忘记说了,我的集群是带有密码认证的,新加节点也是带有认证的

我加上密码认证也是可以的。

可否发个您操作的步骤文档,我对照着再操作一下

@zhengshuxin 我找到问题了,如果我的集群内是有key的,那么reshard就不会成功,反之如果我的集群内没有key,那么reshard就会成功,这是个bug嘛?

@zhengshuxin 我找到问题了,如果我的集群内是有key的,那么reshard就不会成功,反之如果我的集群内没有key,那么reshard就会成功,这是个bug嘛?

确实有个BUG,已经修复,你可以更新一下试试。不过我经过测试,发现 redis 3.xx版本本身有个问题,在迁移时无法填加 AUTH 选项(AUTH 是在 redis 4.0.7 redis 中新加的,参考 https://redis.io/commands/migrate/ )而失败,但如果在redis配置中设置了认证字段,就会出现NOAUTH错误,这是自相矛盾的,所以你如果想要正常迁移数据,最好不要在redis4.0.7以下版本中设置认证字段,否则就会失败,高版本的redis没有这个问题,我在redis6.2.6上测试是可以在迁移过程中带认证信息的。

好的,感谢。

我使用了Redis-6.2.4测试了下,还是报错呢?

How many slots do you want to move (from 1 to 16384) ? 2000
What is the receiving node ID? 1508b36582002b8999d9a46ad87e5c1ffe32253b
Please input all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots
Type 'done' once you entered all the source node IDs.
Source node #1: all
Moving slot 6 from 10.122.132.204:8001 to 10.122.132.204:8007: redis_command.cpp(402), logger_result: result type: 2, error: ERR Target instance replied with error: NOAUTH Authentication required., res: [ERR Target instance replied with error: NOAUTH Authentication required.
], req:[*6
$7
MIGRATE
$14
10.122.132.204
$4
8007
$10
keytest941
$1
0
$5
15000
]
move key: keytest941 error: ERR Target instance replied with error: NOAUTH Authentication required., from: 10.122.132.204:8001, to: 10.122.132.204:8007
move key: keytest941 error, from: 10.122.132.204:8001, to: 10.122.132.204:8007
move slots error, slot: 6
move failed, stop!
move over!

你使用 redis_builder 工具迁移时命令参数通过 -p 指定密码了吗?另外,是用的当前最新的 acl 吗?

你使用 redis_builder 工具迁移时命令参数通过 -p 指定密码了吗?另外,是用的当前最新的 acl 吗?

是的,我通过 -p 选项添加密码了,刚才使用 acl 3.5.3-10 released又测试了下,报如下错误

How many slots do you want to move (from 1 to 16384) ? 2000
What is the receiving node ID? 1508b36582002b8999d9a46ad87e5c1ffe32253b
Please input all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots
Type 'done' once you entered all the source node IDs.
Source node #1: all
Moving slot 6 from 10.122.132.204|8001 to 10.122.132.204|8007: move key: keytest941 error: Success, from: 10.122.132.204|8001, to: 10.122.132.204|8007
move key: keytest941 error, from: 10.122.132.204|8001, to: 10.122.132.204|8007
move slots error, slot: 6
move failed, stop!
move over!

应该用开发版本,redis_builder 修复问题后还没有发 release 版本。

应该用开发版本,redis_builder 修复问题后还没有发 release 版本。

嗯,感谢。我使用开发版本成功了