Improve flex decomposition configuration syntax

Question

Improve flex decomposition configuration syntax

morrone opened this issue a year ago · comments

Christopher J. Morrone commented a year ago

I would like to see the flex decomposition's format improved. Right now there is a level of indirection that I think we could eliminate. Right now, flex basically does:

"type": "flex"
"decomposition": {
  < various other decompositions (static/as_is) in a hash >
}
"digest": {
}

In the decomposition hash, each decomposition is a value, and the key is an arbitrary label that the configuration author invents. This label used to connect these decompositions with hash values in another section. This means a level of indirection when writing and reading decomp files. If we change the format, I think we might be able to these labels. For instance, we could do:

"type":"flex"
"decomposition_list": [
    {
        "matching_schema_digests": []
        "matching_schema_patterns": []
        "decomposition": {}
    }
]

Here I group an array of schema digests and/or an array of schema name patterns directly with a decomposition. This way we eliminate the labels that were previously needed, and make the file a little bit easier for humans to read and write.

Note also that I have smuggled in an additional feature request: we should make it possible to match on schema name patterns (regex), not only be digests. Digests are a bit of a pain in the rear end. In order to configure ldms with digests, one would already have need to configure ldms partly to learn what the digest is. It is a bit of a chicken-and-egg problem. I am fine with them being an option, but we should probably also provide the easier solution of matching decompositions based on just a schema name regex.

Tom Tucker · Answer 1 · Sat Mar 25 2023 01:22:45 GMT+0800 (China Standard Time)

The digest is a SHA256 hash of the metric name and types that make up the schema. It is computed inside ldms when the schema is constructed and carried in the set meta-data when a set is constructed with the schema. The digest makes It possible to discriminate between two schema with the same name, but different definitions. The digest is reported by ldms_ls by adding an extra -v to the options:

tom@ovs-5416 ~/work/sos/rpc                                                                                                                                                                                                │
$ ldms_ls -h localhost -p 10402 -a munge -vv                                                                                                                                                                               
Hostname    : localhost                                                                                                                                                                                                    
IP Address  : 127.0.0.1                                                                                                                                                                                                    
Port        : 10402                                                                                                                                                                                                        
Transport   : sock                                                                                                                                                                                                         
Schema Digest                                                    Schema         Instance                 Flags  Msize  Dsize  Hsize  UID    GID    Perm       Update            Duration          Info                     │
---------------------------------------------------------------- -------------- ------------------------ ------ ------ ------ ------ ------ ------ ---------- ----------------- ----------------- -------- 
CF9BBA30180E8FF29E993FD8306EC3A618887C26CD3D86D0E6CDF5956B285537 vmstat         sampler-6/vmstat            CR    8440   1320      0      0      0 -rwxrwxrwx 1679678180.002079          0.000129 "updt_hint_us"="1000000:0"                                                                                                                                                                                                                      
DE06AA971422A79F057338003F42583338379A2C914BCF77DB506422745296BC procstat       sampler-6/procstat          CR    1784   3776      0      0      0 -rwxrwxrwx 1679678180.002267          0.000179 "updt_hint_us"="1000000:0"                                                                                                                                                                                                                          
39E8567D4EB7DA70FEC57FE8964B2CCA044546CA04C6C2F58C596860FEDAE8AF meminfo        sampler-6/meminfo           CR    2840    520      0      0      0 -rwxrwxrwx 1679678181.001396          0.000070 "updt_hint_us"="1000000:0"

The digest is a pain, but I see no other way to unambiguously map schema decomposition rules to rows without the ability to tag schema with a name that the user cannot get wrong.

It is possible to get rid of the schema name in the digest definition and also to combine the "directory" with the definition. I can't remember why we did it this way other than it provides a convenient dictionary at the bottom to easily read what is mapped where instead of having the keys scattered throughout the file.

WRT comments, we can convert to using the libjansson library instead of the internal libovis_json. This library is more comprehensive than our internal one IMO and supports comments. We could also change the libovis_json parser to support comments.

I think when the configuration is simple, YAML is easier to write and read than JSON. However, if you start describing more sophisticated relationships, I think JSON is more concise and overall easier to read and write. YAML falls over in my view especially given the fact that you can write the same relationship at least 3 different ways (i'm speaking about the {} syntax, vs. the : vs. the pick something else.

Christopher J. Morrone · Answer 2 · Sat Mar 25 2023 09:11:54 GMT+0800 (China Standard Time)

The digest is a pain, but I see no other way to unambiguously map schema decomposition rules to rows without the ability to tag schema with a name that the user cannot get wrong.

The design already supports mapping multiple digests to the same decomposition. Pretty much the only time this sort of thing makes sense is when there are multiple variants of the same basic schema, as in meminfo. If we are making a decomposition that we know is the common denominator of 3 or 4 different meminfo variants, then why force our users (and here I mean me), go log in to 4 different architectures and to manually retrieve the four digest variants, when I know that a simple "meminfo*" pattern will cover my needs?

I am not entirely convinced that I'll want to use digests at all. Maybe I will for some things, and maybe for other things I'll want schema name patterns. I think that flexibility really is warranted here.

Christopher J. Morrone · Answer 3 · Sat Mar 25 2023 09:19:50 GMT+0800 (China Standard Time)

Also, any time "static" decomposition is used, we have a fixed set of values that we are selecting from a schema. As long as schemas change in the form of only adding new fields, the decomposition will remain compatible and the schema name pattern matching will work for that as well.

Tom Tucker · Answer 4 · Sat Mar 25 2023 15:41:31 GMT+0800 (China Standard Time)

I get it. We can certaily add the ability to match on regex in the directory or wherever we move that to.

…

On Fri, Mar 24, 2023, 7:20 PM Christopher J. Morrone < ***@***.***> wrote: Also, any time "static" decomposition is used, we have a fixed set of values that we are selecting from a schema. As long as schemas change in the form of only *adding* new fields, the decomposition will remain compatible and the schema name pattern matching will work for that as well. — Reply to this email directly, view it on GitHub <#1147 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVTPXFNWOJG74DWFOO3RLTW5ZB4BANCNFSM6AAAAAAWFNIC6I> . You are receiving this because you commented.Message ID: ***@***.***>