lae / ansible-role-proxmox

IaC for Proxmox VE clusters.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ceph crushmap tasks not idempotent?

lae opened this issue · comments

Ran into this issue while working on another unrelated PR, seems like these tasks are always coming out changed in the vagrant test environment (vagrant up). Maybe this broke in PVE 7? (or maybe it just wasn't caught in PVE 6? I'll have to test that out later.)

PLAY [all] *********************************************************************

TASK [lae.proxmox : HOOK - Run ZFS post-install hook tasks] ********************
included: /home/lae/src/ansible-role-proxmox/tests/vagrant/tasks/zpool_setup.yml for pve-3, pve-2, pve-1

TASK [lae.proxmox : Modify crushmap for rules that should be updated] **********
changed: [pve-3] => (item={'name': 'hdd'}) => {
    "ansible_loop_var": "item",
    "changed": true,
    "item": {
        "name": "hdd"
    }
}

MSG:

1 replacements made

TASK [lae.proxmox : Compress and upload changed crushmap] **********************
changed: [pve-3] => (item=crushtool -c crush_map_decompressed -o new_crush_map_compressed) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": [
        "crushtool",
        "-c",
        "crush_map_decompressed",
        "-o",
        "new_crush_map_compressed"
    ],
    "delta": "0:00:00.008121",
    "end": "2021-10-19 15:24:29.408096",
    "item": "crushtool -c crush_map_decompressed -o new_crush_map_compressed",
    "rc": 0,
    "start": "2021-10-19 15:24:29.399975"
}
changed: [pve-3] => (item=ceph osd setcrushmap -i new_crush_map_compressed) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": [
        "ceph",
        "osd",
        "setcrushmap",
        "-i",
        "new_crush_map_compressed"
    ],
    "delta": "0:00:00.400244",
    "end": "2021-10-19 15:24:29.975847",
    "item": "ceph osd setcrushmap -i new_crush_map_compressed",
    "rc": 0,
    "start": "2021-10-19 15:24:29.575603"
}

STDERR:

7

PLAY RECAP *********************************************************************
pve-1                      : ok=69   changed=0    unreachable=0    failed=0    skipped=64   rescued=0    ignored=0
pve-2                      : ok=69   changed=0    unreachable=0    failed=0    skipped=64   rescued=0    ignored=0
pve-3                      : ok=84   changed=2    unreachable=0    failed=0    skipped=49   rescued=0    ignored=0

Its a misleading 'changed', I believe this has been in there for a good while now, since a previous CEPH change i made. I shall assign it to myself and fix.

So i have found the problem with this, the culprit is 'whitespaces'. In the replace section i have seperated the rule out onto multiple lines to make it readable and maintainable, unfortunately the chomp >- replaces newlines with a single whitespace which result in the file being different but only by whitespaces.. This is a fun one to try and fix whilst keeping it readable and maintainable! Rest assured, at the moment it maybe saying changed each time but it is actually not doing anything to your crushmaps!

Gotcha. I was thinking it was probably something along that line.

Briefly reviewing the tasks now, I have a couple of suggestions to improve these tasks.

  • instead of writing out crush_map_{de,}compressed files and having to delete it later, how about registering the stdout into a variable? In other words can the two tools being used output to stdout and parse stdin?
  • to sanitize the output and make it easier to compare, maybe you can just massage the crushmap output through sed or tr?
  • instead of using replace, maybe we can just use set_fact and then add a when clause to the upload task that compares the generated fact and the output that we got (with all their whitespace massaged to be the same)
  • like the first item, we can pass data over stdin when uploading a new crushmap

I'll let you figure out what works best, but I hope the above helps.

I like your suggestions and will try them out as it would be nice to neaten this section up, in the meantime i have issued a PR that at least fixes the issue :-)