Rename managers (master and slave)

Question

Rename managers (master and slave)

posledov opened this issue 4 years ago · comments

Hello.

I was faced with the need to rename the master and slave managers (nodename).

For example:
M0@dc1-mgr01.s3.local.lan -> DC1_M1@10.13.1.11
M1@dc1-mgr02.s3.local.lan -> DC1_M2@10.13.1.12

I've tried to rename managers according to this instruction https://leo-project.net/leofs/docs/admin/system_admin/leo_manager/#case-2-launch-a-new-manager-masterslave-instead-of-a-collapsed-node-takeover, but faced with problems when tried to the Import the mnesia-data from the backup files...
I apologize, but I didn't save the error text for bug reporting 😞

LeoFS version is 1.4.3

Please tell me if it is possible to do this and if so, what steps should be taken.

Thanks you.
Kind regards.

Yosuke Hara · Answer 1 · Fri Jan 15 2021 08:12:00 GMT+0800 (China Standard Time)

I will investigate how to rename LeoManager nodes and then share the procedure.

Ihor Posliedov · Answer 2 · Sun Jan 17 2021 23:54:14 GMT+0800 (China Standard Time)

@yosukehara, hello.

Sorry... Have you any news/progress about rename managers procedure?

Yosuke Hara · Answer 3 · Mon Jan 18 2021 08:43:46 GMT+0800 (China Standard Time)

Sorry... Have you any news/progress about rename managers procedure?

I started investigating this issue on last Friday. I'm still investigating it.

Yosuke Hara · Answer 4 · Mon Jan 18 2021 19:28:36 GMT+0800 (China Standard Time)

I would like to share the procedure as bellow. I share important things which is to back up all mnesia files of LeoManager nodes so that they can be restored.

[Procedure]

Stop all nodes
1. Stop LeoManager Master node
2. Stop LeoManager Slave node
3. Stop LeoGateway node(s)
4. Stop LeoStorage node(s)
Move LeoManager's mnesia files (Not COPY)
1. Make a directory to store the current mnsia files (Both Master and Slave)
2. Move work/mnesia/127.0.0.1/* to path/to/mnedia-archive-dir (Both Master and Slave)
Modify the configuration
- LeoManager:
  - nodename - ref: https://github.com/leo-project/leofs/blob/v1/apps/leo_manager/priv/leo_manager_0.conf#L202
  - manager.partner - ref: https://github.com/leo-project/leofs/blob/v1/apps/leo_manager/priv/leo_manager_0.conf#L42
- LeoStorage:
  - managers - ref: https://github.com/leo-project/leofs/blob/v1/apps/leo_storage/priv/leo_storage.conf#L42
- LeoGateway:
  - managers - ref: https://github.com/leo-project/leofs/blob/v1/apps/leo_gateway/priv/leo_gateway.conf#L42
Start all nodes
1. Start LeoManager Master node
2. Start LeoManager Slave node
3. Start LeoStorage node(s)
4. Start LeoGateway node(s)
Execute leofs-adm status command to confirm the state (attached) of the storage nodes
Execute leofs-adm start command
- Confirm the state of the storage nodes - running
- Confirm the state of the gateway nodes - running
- Confirm Manager RING hash

Ihor Posliedov · Answer 5 · Wed Jan 20 2021 05:12:55 GMT+0800 (China Standard Time)

Hello, @yosukehara
First of all, thank you for your help!

I tried to perform all the manipulations you described (in the order you specified), but after starting the cluster, information about the buckets was lost...

(There is an assumption that information about users, rights, etc. has also become unavailable ... I did not check, since the lack of buckets is definitely a reason for a rollback)

Please tell me if it is possible to rename managers while preserving all cluster data?

Yosuke Hara · Answer 6 · Thu Jan 21 2021 10:03:17 GMT+0800 (China Standard Time)

I forgot to share about recreating users, endpoints and buckets.

How to create an endpoint:
- command: leofs-adm add-endpoint <endpoint>
- reference: leofs/documentation/s3-api-related-operation/endpoint
How to create a user:
- command: leofs-adm create-user <user-id> <password>
- reference: leofs/documentation/s3-api-related-operation/user
How to create a bucket:
- command: leofs-adm add-bucket <bucket-name> <access-key-id>
- reference: leofs/documentation/s3-api-related-operation/bucket
How to update ACL of a bucker:
- command: leofs-adm update-acl <bucket> <access-key-id> <canned-ACL>
- reference: leofs/documentation/s3-api-related-operation/bucket-acl

And then you may be able to finally access buckets and objects.

Ihor Posliedov · Answer 7 · Mon Jan 25 2021 00:02:35 GMT+0800 (China Standard Time)

Hello @yosukehara

Sorry for the long feedback…
After testing the steps to recreate users, buckets and endpoints, I can confirm that the bucket's data is available again.

But there is one note: instead of creating users, I used the import-user command with the old access-key-ids and secret-access-keys:

leofs-adm delete-user _test_leofs

leofs-adm import-user <user1> <access-key-id1> <secret-access-key1>
leofs-adm import-user <user2> <access-key-id2> <secret-access-key2>

leofs-adm update-user-role <user1> 9
leofs-adm update-user-role <user2> 9

leofs-adm add-bucket <bucket1> <access-key-id1>
leofs-adm add-bucket <bucket2> <access-key-id2>

leofs-adm update-acl <bucket2> <access-key-id2> public-read

leofs-adm add-endpoint s3.example.net

Thank you very much for your help!

Kind regards.
Igor.

Yosuke Hara · Answer 8 · Mon Jan 25 2021 09:25:14 GMT+0800 (China Standard Time)

Thank you for sharing. Let me know results of leofs-adm whereis <file-path>.

reference: Index of leofs-adm Command Lines

Ihor Posliedov · Answer 9 · Mon Jan 25 2021 14:06:36 GMT+0800 (China Standard Time)

leofs-adm whereis bucket1/materialicons/
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |         node          |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | DC1_S1@127.0.0.1      | 19fb02e2870952f21ea4cbd75ca7c1dc     |         0B |   d41d8cd98f | false          |              0 | 5b9a63d8ed7e6  | 2021-01-24 16:28:52 +0200


leofs-adm whereis bucket1/materialicons/icon.css
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |         node          |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | DC1_S1@127.0.0.1      | 570ae1b9220891ad6ada0467602e6c4      |       571B |   483145ffe2 | false          |              0 | 5b9a63dabf375  | 2021-01-24 16:28:53 +0200

Can you tell me please why has children for bucket1/materialicons/ is false?

Yosuke Hara · Answer 10 · Sat Jan 30 2021 08:38:31 GMT+0800 (China Standard Time)

Thank you for your reply. It seems to be an error in the configuration of LeoManager(Master). Looking at the following this figure, it is set to "N = 3". However, the result of whereis is "N = 1".

reference: For Administrators / Settings / Cluster Settings / Consistency Level

I recommend that you try to start again with #1204 (comment) and #1204 (comment).

Ihor Posliedov · Answer 11 · Sun Jan 31 2021 02:15:49 GMT+0800 (China Standard Time)

Looking at the following this figure, it is set to "N = 3". However, the result of whereis is "N = 1".

Oh sorry. That screenshot was taken in the production cluster, while as the result of the whereis - in the test environment during the second attempt to go through all the steps to rename the managers.
Sorry to confuse you and thank you for your consideration 🤝