Wrong logic for reclass _param checks

Question

Wrong logic for reclass _param checks

jiribroulik opened this issue 7 years ago · comments

Currently reclass probably checks if all _params are defined for a node. It should be actually checked at 'apply' pillar (meaning at the end) if the end param is defined. Steps to reproduce:

Have this pillar:

parameters:
  _param:
    salt_glusterfs_service_host: ${_param:glusterfs_service_host}
    glusterfs_node01_address: ${_param:cluster_node01_address}
    glusterfs_node02_address: ${_param:cluster_node02_address}
    glusterfs_node03_address: ${_param:cluster_node03_address}
  glusterfs:
    client:
      volumes:
        salt_pki:
          path: /srv/salt/pki
          server: ${_param:salt_glusterfs_service_host}
          opts: "defaults,backup-volfile-servers=${_param:glusterfs_node01_address}:${_param:glusterfs_node02_address}:${_param:glusterfs_node03_address}"

Even though in reality you care about the glusterfs_node01_address, glusterfs_node02_address, glusterfs_node03_address because its applied at the last line. Reclass gives error on cluster_node01_address, cluster_node02_address, cluster_node03_address. Which is only 'in the middle' param, never used. So it should not report error.

Can this be fixed please?

Petr Michalec · Answer 1 · Fri Dec 15 2017 18:36:00 GMT+0800 (China Standard Time)

I simulated the issue on this simple example: https://github.com/epcim/reclass-issue14

➜  reclass git:(master) tree
.
├── classes
│   ├── first.yml
│   ├── second.yml
│   └── third.yml
├── nodes
│   └── dontpanic.yml
└── reclass-config.yml

2 directories, 5 files
➜  reclass git:(master) cat classes/second.yml 

classes:
- first

parameters:
  _param:
    # aaa: dummy
     aaa: ${_param:yyy}
     ccc: ${_param:aaa}
  mykey: ${_param:ccc}

➜  reclass git:(master) cat classes/third.yml 

classes:
  - second

parameters:
  _param:
    ccc: 444
  mykey: ${_param:ccc}


➜  reclass git:(master) reclass --nodeinfo dontpanic                                  
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/reclass/values/refitem.py", line 53, in _resolve
    return path.get_value(context)
  File "/usr/lib/python2.7/dist-packages/reclass/utils/dictpath.py", line 128, in get_value
    return self._get_innermost_container(base)[self._get_key()]
KeyError: 'yyy'

-> dontpanic
   Cannot resolve ${_param:yyy}, at _param:aaa, in yaml_fs:///etc/reclass/classes/second.yml

Petr Michalec · Answer 2 · Fri Dec 15 2017 18:42:26 GMT+0800 (China Standard Time)

Bug is thrown from:
https://github.com/salt-formulas/reclass/blob/master/reclass/values/refitem.py#L55

Andrew Pickford · Answer 3 · Mon Dec 18 2017 18:21:40 GMT+0800 (China Standard Time)

If I've understood the issue correctly then this arises from how reclass merges, or for scalar values overwrites parameters. In @epcim's example the logic for resolving _param:ccc first makes a list: [${_param:aaa}, 444] and then tries to resolve each list element in order. So first reclass tries resolving _param:aaa which looks for _param:yyy which isn't present and so the whole thing fails.

This is perfectly reason behaviour when merging lists and dicts as reclass needs to resolve each parameter involved to merge them together. But it does produce this edge case for scalars that the final value only depends on the final parameter in the list of parameters and not on any of the preceding values.

Note that even if in the example _param:ccc was resolved to 444 with out generating an error then an error would still happen as reclass will still try to resolve _param:aaa, which would still fail.

In order to ignore unneeded parameters during parameter merge/overwrite reclass would need to resolve the parameter list in reverse order and have some logic for stopping on a failed resolve and in the case that the final element is a scalar and all the resolvable previous elements are scalars just using the final scalar value. However for lists and dicts after the elements are resolved the merge would still need to happen from the first element in the parameter list.

I'm not sure if that is the correct thing to do or if the current behaviour is preferable and how the _param parameters are organised should be rewritten. I don't use the _param reclass organisation myself so it's not a problem I've run into.

For the second issue of the _param parameters that would fail but are not needed (_param:aaa) it would be reasonably clean to add an option to supply a regex and only resolve parameters matching the regex and any parameters the regex generated list of parameters depended on. Which could include parameters not matching the original regex.

Petr Michalec · Answer 4 · Thu Dec 28 2017 16:56:03 GMT+0800 (China Standard Time)

Original reclass, did the interpolation different way and the issue is relevant. The way to resolve this might be A) better algorithm, B) Accept the current, backward non-compatible behaviour, C) Fix our models on all levels, D) Workaround, that will not throw an exception, but will store "UNKNOWN" as value.

Unless someone claims to rewrite it according to A) I would go by conditional D option. Possibly allow throwing an error anyway, if in last loop/highest structure was UNKNOWN not resolved. That would also allow us to summary all possible missing interpolations in one error output.

Andrew Pickford · Answer 5 · Wed Jan 03 2018 23:15:49 GMT+0800 (China Standard Time)

The root cause of the difference are changes in my fork stemming from how original reclass treated merging references. In original reclass references are first merged (a reference simply overwrites a previous reference) and then the references are evaluated. For my fork references are first evaluated and then merged. This can be seen with the following:

nodes/node1.yml:

classes:
  - test1
  - test2
  - test3

classes/test1.yml:

parameters:
  a:
    - 1
    - 2
    - 3
  b:
    - 4
    - 5
    - 6

classes/test2.yml

parameters:
  c: ${a}

classes/test3.yml

parameters:
  c: ${b}

with original reclass the parameter c evaluates to the list [4,5,6] with my fork it evaluates to [1,2,3,4,5,6].

@epcim - As a runtime option (as I need the new reference merging style) a more original reclass like merging is doable. But it's bound to have some oddities/differences from the original reclass.

Andrew Pickford · Answer 6 · Wed Jan 03 2018 23:48:23 GMT+0800 (China Standard Time)

How about the following parameter organisation to fix the errors:

parameters:
  _param:
    cluster_node_addresses: {}
    glusterfs_node_addresses: ${_param:cluster_node_addresses}

    test: ${_param:glusterfs_node_addresses:node01}

With cluster_node_addresses and glusterfs_node_addresses as dictionaries default values can be written to cluster_node_addresses and used by glusterfs_node_addresses. By merging in a empty dictionary onto cluster_node_addresses this gives a reasonable errors if node addresses are missing:

-> node1
   Cannot resolve ${_param:glusterfs_node_addresses:node01}, at _param:test, in yaml_fs:///home/test/reclass/test8/classes/test2.yml

Node addresses can also be directly written into the glusterfs_node_addresses dict after it is merged with the ${_param:cluster_node_addresses} so that they overwrite the values from cluster_node_addresses without changing values in cluster_node_addresses.

Note this will not work with original reclass

Petr Michalec · Answer 7 · Fri Jan 12 2018 20:45:04 GMT+0800 (China Standard Time)

I lost my week old comment. I fully agree with Andrew - not a bug, there is a way to fix behaviour on our side - a great suggestion we should implement. tl;dr - saltclass pillar simply passes ${not:found:option} if not interpolated.

We could do optionally the same (still, probably we want to keep throwing an error) as fear what would happen if such are passed to system/network configs and then executed :(.

Will send some patch next week, hopefully. Finally we need a workaround first as a change of our shared system models for backward compatibility will take much longer.

Petr Michalec · Answer 8 · Wed Feb 28 2018 18:24:36 GMT+0800 (China Standard Time)

@AndrewPickford and other, please review the proposed fix + feature to actually print all missed references at once.

Andrew Pickford · Answer 9 · Fri Mar 02 2018 22:24:10 GMT+0800 (China Standard Time)

@epcim I've been swamped with a batch system upgrade so will try out the proposed changes next week.

Petr Michalec · Answer 10 · Mon Mar 12 2018 22:35:58 GMT+0800 (China Standard Time)

@AndrewPickford can you have a quick look today. I would like to merge it quite soon so we move on.

Petr Michalec · Answer 11 · Fri Mar 16 2018 23:09:37 GMT+0800 (China Standard Time)

Resolved by #18