leo-project / leofs

The LeoFS Storage System

Home Page:https://leo-project.net/leofs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rename managers (master and slave)

posledov opened this issue · comments

Hello.

I was faced with the need to rename the master and slave managers (nodename).

For example:
M0@dc1-mgr01.s3.local.lan -> DC1_M1@10.13.1.11
M1@dc1-mgr02.s3.local.lan -> DC1_M2@10.13.1.12

I've tried to rename managers according to this instruction https://leo-project.net/leofs/docs/admin/system_admin/leo_manager/#case-2-launch-a-new-manager-masterslave-instead-of-a-collapsed-node-takeover, but faced with problems when tried to the Import the mnesia-data from the backup files...
I apologize, but I didn't save the error text for bug reporting 😞

LeoFS version is 1.4.3

Please tell me if it is possible to do this and if so, what steps should be taken.

Thanks you.
Kind regards.

I will investigate how to rename LeoManager nodes and then share the procedure.

@yosukehara, hello.

Sorry... Have you any news/progress about rename managers procedure?

Sorry... Have you any news/progress about rename managers procedure?

I started investigating this issue on last Friday. I'm still investigating it.

I would like to share the procedure as bellow. I share important things which is to back up all mnesia files of LeoManager nodes so that they can be restored.

[Procedure]

  1. Stop all nodes
    1. Stop LeoManager Master node
    2. Stop LeoManager Slave node
    3. Stop LeoGateway node(s)
    4. Stop LeoStorage node(s)
  2. Move LeoManager's mnesia files (Not COPY)
    1. Make a directory to store the current mnsia files (Both Master and Slave)
    2. Move work/mnesia/127.0.0.1/* to path/to/mnedia-archive-dir (Both Master and Slave)
  3. Modify the configuration
  4. Start all nodes
    1. Start LeoManager Master node
    2. Start LeoManager Slave node
    3. Start LeoStorage node(s)
    4. Start LeoGateway node(s)
  5. Execute leofs-adm status command to confirm the state (attached) of the storage nodes
  6. Execute leofs-adm start command
    • Confirm the state of the storage nodes - running
    • Confirm the state of the gateway nodes - running
    • Confirm Manager RING hash

Hello, @yosukehara
First of all, thank you for your help!

I tried to perform all the manipulations you described (in the order you specified), but after starting the cluster, information about the buckets was lost...

Снимок экрана 2021-01-19 в 21 55 41
(There is an assumption that information about users, rights, etc. has also become unavailable ... I did not check, since the lack of buckets is definitely a reason for a rollback)

Please tell me if it is possible to rename managers while preserving all cluster data?

I forgot to share about recreating users, endpoints and buckets.

And then you may be able to finally access buckets and objects.

Hello @yosukehara

Sorry for the long feedback…
After testing the steps to recreate users, buckets and endpoints, I can confirm that the bucket's data is available again.

But there is one note: instead of creating users, I used the import-user command with the old access-key-ids and secret-access-keys:

leofs-adm delete-user _test_leofs

leofs-adm import-user <user1> <access-key-id1> <secret-access-key1>
leofs-adm import-user <user2> <access-key-id2> <secret-access-key2>

leofs-adm update-user-role <user1> 9
leofs-adm update-user-role <user2> 9

leofs-adm add-bucket <bucket1> <access-key-id1>
leofs-adm add-bucket <bucket2> <access-key-id2>

leofs-adm update-acl <bucket2> <access-key-id2> public-read

leofs-adm add-endpoint s3.example.net

Thank you very much for your help!

Kind regards.
Igor.

Thank you for sharing. Let me know results of leofs-adm whereis <file-path>.

leofs-adm whereis bucket1/materialicons/
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |         node          |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | DC1_S1@127.0.0.1      | 19fb02e2870952f21ea4cbd75ca7c1dc     |         0B |   d41d8cd98f | false          |              0 | 5b9a63d8ed7e6  | 2021-01-24 16:28:52 +0200


leofs-adm whereis bucket1/materialicons/icon.css
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |         node          |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | DC1_S1@127.0.0.1      | 570ae1b9220891ad6ada0467602e6c4      |       571B |   483145ffe2 | false          |              0 | 5b9a63dabf375  | 2021-01-24 16:28:53 +0200

Can you tell me please why has children for bucket1/materialicons/ is false?

Thank you for your reply. It seems to be an error in the configuration of LeoManager(Master). Looking at the following this figure, it is set to "N = 3". However, the result of whereis is "N = 1".

I recommend that you try to start again with #1204 (comment) and #1204 (comment).

Looking at the following this figure, it is set to "N = 3". However, the result of whereis is "N = 1".

Oh sorry. That screenshot was taken in the production cluster, while as the result of the whereis - in the test environment during the second attempt to go through all the steps to rename the managers.
Sorry to confuse you and thank you for your consideration 🤝