Merge "lazyness" behaviour changes between nickel==1.2.2 and nickel==1.3.0

Question

Merge "lazyness" behaviour changes between nickel==1.2.2 and nickel==1.3.0

uhlajs opened this issue 8 months ago · comments

Let's take following snippet:

let module_a = {
  inputs | not_exported = {
      name
        | String,
    },
  module_a_name = inputs.name
}
in
let module_b = {
  inputs | not_exported = {
      name
        | String,
    },
  module_b_name = inputs.name
}
in
{
  stack | not_exported = {
      my_module_a =
        module_a 
        & {
          inputs = {
            name = "my_module_a",
          }
        },
      my_module_b =
        module_b
        & {
          inputs = {
            name = "my_module_b",
          }
        }
    },
    config =
    std.record.values
      stack
    |> std.record.merge_all
}

With nickel==1.2.2 snippet produce "expected" output:

❯ nickel export < test.ncl
{
  "config": {
    "module_a_name": "my_module_a",
    "module_b_name": "my_module_b"
  }
}

With nickel==1.3.0 snippet produce error:

❯ nickel export < test.ncl
error: non mergeable terms
     ┌─ <stdin>:23:20
     │
  23 │             name = "my_module_a",
     │                    ^^^^^^^^^^^^^ cannot merge this expression
     ·
  30 │             name = "my_module_b",
     │                    ^^^^^^^^^^^^^ with this expression
     │
     ┌─ <stdlib/std.ncl>:2066:41
     │
2066 │       = fun rs => (std.array.fold_left (&) {} (rs | Array Dyn)) | { _ : Dyn },
     │                                         - originally merged here
     │
     = Both values have the same merge priority but they can't be combined.
     = Primitive values (Number, String, and Bool) or arrays can be merged only if they are equal.
     = Functions can never be merged.

If I replace the config definition with:

  config = stack.my_module_a & stack.my_module_b

both nickel version produces the error. Since the nickel implementations of std.record.merge_all does NOT change between these versions, I guess that something was changed inside the rust implementation.

Questions:

What is the expected nickel behavior for merging these two/four records?
If the behavior of nickel==1.3.0 is correct, how can I modify the snippet, so I get the same final record as with nickel==1.2.2 and std.record.merge_all implementation.

For reference the idea of using inputs is similar to Toward Modules.

Probably related to !819.

Viktor Kleen · Answer 1 · Fri Jan 12 2024 20:33:10 GMT+0800 (China Standard Time)

I think the root cause is a difference in whether the inputs field in config gets evaluated or not. I suppose that as of Nickel 1.3, it gets evaluated even if it's not exported and evaluation can't succeed because of the different inputs records in module_a and module_b. I think the behavior in Nickel 1.3 is somewhat reasonable, although I'm not sure why inputs is forced...

Viktor Kleen · Answer 2 · Fri Jan 12 2024 20:42:11 GMT+0800 (China Standard Time)

A workaround, and probably an avenue for cleaner semantics here, would be to introduce an evaluation phare distinction: first process all overrides to a module, then remove the inputs field using std.record.remove and then merge the entire stack. Without something like that there won't be any way to distinguish which modules inputs field is getting targeted, I think.

Honza Uhlík · Answer 3 · Fri Jan 12 2024 20:53:06 GMT+0800 (China Standard Time)

I think, I understand the idea, but I have no clue how to process all overrides to a module. Can you please bit elaborate on that?

Viktor Kleen · Answer 4 · Fri Jan 12 2024 21:23:49 GMT+0800 (China Standard Time)

I was thinking something along the line of:

{
  module_a = {
    inputs | not_exported = {
        name
          | String,
      },
    module_a_name = inputs.name
  },
  module_b = {
    inputs | not_exported = {
        name
          | String,
      },
    module_b_name = inputs.name
  },

  remove_inputs = fun r => r |> std.record.map_values std.function.id |> std.record.remove "inputs",

  stack | not_exported = {
      my_module_a =
        module_a 
        & {
          inputs = {
            name = "my_module_a",
          }
        },
      my_module_b =
        module_b
        & {
          inputs = {
            name = "my_module_b",
          }
        }
    },

    config =
    std.record.values
      stack
    |> std.array.map remove_inputs
    |> std.record.merge_all
}

In essence, remove_inputs forces all recursive field dependencies to be resolved, and then gets rid of the inputs field. This also means that you won't reasonably be able to override values using merging anymore, after applying remove_inputs. Also, the std.record.map_values trick is arcane, si I'd really like to come up with a better way of doing these things eventually.

Honza Uhlík · Answer 5 · Fri Jan 12 2024 21:46:40 GMT+0800 (China Standard Time)

Got it, many thanks for this workaround! It works reasonably for my case, since I don't need to override values any more.

I agree that it would be nice to have a better support for this "module like" semantic (without this ugly early evaluation hack).

Yann Hamdaoui · Answer 6 · Fri Jan 12 2024 21:47:08 GMT+0800 (China Standard Time)

@vkleen Isn't r |> std.record.map_values std.function.id just r, or am I missing something?

Viktor Kleen · Answer 7 · Fri Jan 12 2024 21:52:54 GMT+0800 (China Standard Time)

The contract on std.record.map_values destroys the recursive thunks in r. Without it, you'll get

error: unbound identifier `inputs`
   ┌─ /home/vkleen/work/tweag/nickel/nickel/master/test.ncl:14:21
   │
14 │     module_b_name = inputs.name
   │                     ^^^^^^ this identifier is unbound

Yann Hamdaoui · Answer 8 · Fri Jan 12 2024 21:53:53 GMT+0800 (China Standard Time)

@uhlajs I would personally take a different approach: in some sense you have a namespace issue. In the merging model, it's a bit more annoying to handle. Here is the thing:

If all your modules share their inputs field, because you merge them, then any parameter appearing in this inputs field is "shared" and should be the same for all modules. Think of an environment of some sort.

Here, you use name as a local module parameter, which is different for each module. One solution is to namespace it: for example, either use name_module_a or module_a.name or a name that is not the same as for module_b.

Another solution would be to have a different field for such local inputs, such as locals or local_inputs or whatever. You could then remove it as @vkleen is doing. Or not merging it in the first place and just merge the config field of each module (but in this case, you can't have a "global" shared inputs anymore):

config =
    stack
    |> std.record.values
    |> std.record.get "config" # Introduced in 1.4, it's just fun field r => r."%{field}"
    |> std.array.map remove_inputs
    |> std.record.merge_all

Yann Hamdaoui · Answer 9 · Fri Jan 12 2024 21:55:43 GMT+0800 (China Standard Time)

The contract on std.record.map_values destroys the recursive thunks in r.

Oh, interesting. So record.remove doesn't freeze the record. Which is not entirely unreasonable, but can be surprising. Maybe it's time to add record.freeze or record.fix to do that

Honza Uhlík · Answer 10 · Fri Jan 12 2024 23:05:21 GMT+0800 (China Standard Time)

@yannham I was playing a bit with your suggestions (thanks for them!) and here is one "counter example" for the namespacing of the local module parameter. Let's consider slightly different example. I have a module, which I want to use twice with different input values.

{
  module | not_exported = {
    inputs | not_exported = {
        name
          | String,
      },
    
    resource."%{inputs.name}" = {},

    config | not_exported = {
      resource."%{inputs.name}" = {}
    }
  },

  stack | not_exported = {
      my_module_a =
        module
        & {
          inputs = {
            name = "my_module_a",
          }
        },
      my_module_b =
        module
        & {
          inputs = {
            name = "my_module_b",
          }
        },
    },
  # Obviously doesn't work
  # config = stack.my_module_a & stack.my_module_b,
  remove_inputs | not_exported = fun r => r |> std.record.map_values std.function.id |> std.record.remove "inputs",
  std_record_get | not_exported = fun field r => r."%{field}",

  config_vkleen =
    stack
    |> std.record.values
    |> std.array.map remove_inputs
    |> std.record.merge_all,

  config =
    stack
    |> std.record.values
    |> std.array.map (std_record_get "config")
    |> std.record.merge_all
}

Now, I cannot change the input parameter name, since the module is the same. On the other hand, the alternative solution with taking only module.config works nicely. Actually, not having a "global" shared inputs is quite desirable here.

Yann Hamdaoui · Answer 11 · Sat Jan 13 2024 01:04:07 GMT+0800 (China Standard Time)

Ah, yes, this approach doesn't work if you have several copies of the same module. By nature, merging combines pieces together in one final value and I think it's hard to avoid that everything lives in the same namespace, at least when just using bare merging (though you could use contracts to restrict access probably).

The thing is, in the blog post, I argue that using record merging is more adapted than functions because it's a flat structure, it's easy to override and thus to reconfigure by relying on Nickel's merge system, and easy to combine.

If you just want to instantiate some parameter differently and that you are dead sure you don't want to access this parameter from the outer world (say, another module) or override it after the fact (you or the any other consumer of your code), I start to wonder if a simple plain function would actually do the trick.

That is, defining module_a = fun name => {...module def without inputs and using name instead of inputs.name...]. Or, a way to see this would be a function that generates a concrete module from a parameter. In that case , name would probably need to be unique (each time it is provided) and you lose the ability to override it, but depending on your use-case, it might be the good tradeoff. Modules are but not everything has to be a module.

All of that being said,

I'm still clueless as to why this code doesn't fail in 1.2.2. I think, as Viktor, that it's reasonable that it fails in 1.3.0, but I after a quick look I couldn't spot on obvious change in serialization or the handling of not_exported. Maybe some obscure recursive record bug fixing changed that.
Whether functions are the right choice for your particular use-case, it's still an interesting problem in general to think about those local variables, name-spacing issues, and "generative" modules. If I'm not mistaken, in the NixOS module system, they take the same "only consider module.config" route, so that when defining modules you can cross-refer to other module inputs (with e.g. module_a.inputs = ... from within module_b.config), but only config fields are combined in the end to form the final result.

Sebastien Mamessier · Answer 12 · Sat Jan 13 2024 04:25:03 GMT+0800 (China Standard Time)

What about this simple workaround

let module_a = {
  inputs | not_exported = {
    name | String,
  },
  output.module_a_name = inputs.name
}
in
let module_b = {
  inputs | not_exported = {
    name | String,
  },
  output.module_b_name = inputs.name
}
in
{
  stack | not_exported = {
      my_module_a = (module_a & {
        inputs.name = "my_module_a",
      }).output,

      my_module_b = (module_b & {
        inputs.name = "my_module_b",
      }).output
    },

  config = stack.my_module_a & stack.my_module_b
}

Honza Uhlík · Answer 13 · Sat Jan 13 2024 18:35:36 GMT+0800 (China Standard Time)

@smamessier Your workaround is basically identical to the @yannham alternative proposal:

Or not merging it in the first place and just merge the config field of each module (but in this case, you can't have a "global" shared inputs anymore)

Following snippet just removes the complexity from the module field assignment to the config field assignment:

  config =
    stack
    |> std.record.values
    |> std.array.map (std_record_get "config")
    |> std.record.merge_all

Sebastien Mamessier · Answer 14 · Mon Jan 15 2024 06:04:29 GMT+0800 (China Standard Time)

@uhlajs Yes indeed I somehow missed that proposal as I saw @yannham's snippet was still removing the inputs fields.
I think this is quite clean. This is what you would do in cuelang for example.

Honza Uhlík · Answer 15 · Sat Jan 20 2024 00:07:03 GMT+0800 (China Standard Time)

Since the behavior of nickel==1.3.0 seems to be the "expected" one, I am going to close this issue. Feel free to reopen it if necessary.

Summary for everyone, who will hit this issue in future. We ended up with solution similar to @yannham proposal. Right now, we extract only a specific subset of fields from the module and so we are no longer facing the inputs merge "issue".

Many thanks to @yannham and @vkleen!