calliope-project / calliope

A multi-scale energy systems modelling framework

Home Page:https://www.callio.pe

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Easily define group constraints with custom math

jnnr opened this issue · comments

What can be improved?

Here I summarize some feedback using group constraints in calliope 0.7 with custom math and tabular data, after working with it in a use case and discussing with our colleagues. Good news: It works. There is room for improvement regarding intuitive usability.

How do we define a constraints on the summed capacity for some defined subsets of nodes? The following works, as proposed by @brynpickering:

Define which region belongs to which group in node_groups.csv

region_1,group_1,true
region_2,group_1,true
region_3,group_2,true

Define the flow_cap of each group in flow_cap_node_groups.csv

group_1,120
group_2,80

Define new custom math in flow_cap_group_constraint.yaml

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: sum(flow_cap[carriers=electricity] * default_if_empty(lookup_node_group, 0), over=nodes) <= node_group_flow_cap

Put it together by loading the above files in model.yaml.

cap_node_groups:
   source: tabular-data/scalars/node_groups.csv
   rows: [nodes, cap_node_groups]
   add_dimensions:
     parameters: lookup_node_group

node_groups_flow_cap:
  source: tabular-data/scalars/node_groups_flow_cap.csv
  rows: [node_groups]
  columns: [parameters]

Feedback after some discussion.

  1. We can make a node member in two groups by adding a line.
  2. The csv version seems ok, but the column data=true is confusing. I know this is because how calliope broadcasts this data to the full set for each dimension. But for usability, let's pretend we can drop that, so the file would look like this:

node_groups.csv

region_1,group_1
region_2,group_1
region_2,group_2
region_3,group_2
  1. Another, very intuitive way was implemented in the group constraints of calliope v0.6
node_group_1: 
    [region_1, region_2]
node_group_2: 
    [region_3]

When we discussed, we said that something like that could offer an intuitive way to parametrize the model, too.

  1. The sum in the constraint uses default_if_empty, which is a kind of a hack allowing to sum only over the nodes that are member in the node group. It works, but it is not very intuitive. Can't we allow to slice the variable flow_cap directly using a subset?

Version

v0.7.0.dev3

Pitching in here: I really like approach number 4. It's very intuitive.

As for the default_if_empty: I think we should arrive to a style that makes it explicit that using subsets in a sum or a constraint will ignore cases outside the subsets specified. Mostly to encourage explicit code which is easier to maintain and read.

The implementation in v0.6 was reasonably intuitive, but it was a nightmare to generalise in code!
We could simplify things by adding a groupby helper function, although it needs some thought. E.g.,:

parameters:
  node_grouping:
    data: [group_1, group_1, group_2, group_2, group_3]
    index: [region1, region1_1, region1_2, region1_3, region2]
    dims: [nodes]
  flow_cap_node_groups:
    data: [1, 2, 3]
    index: [group_1, group_2, group_3]
    dims: cap_node_groups
Equivalent in CSV

region1,group_1
region1_1,group_1
region1_2,group_2
region1_3,group_2
region2,group_3

and

group_1,1
group_2,2
group_3,3

In math:

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: "groupby(flow_cap, node_grouping, how=sum) <= flow_cap_node_groups"

As nice as this is, flow_cap.groupby(node_grouping).sum() creates the new coordinate "node_grouping" when we need it to be "cap_node_groups". There needs to be some way to inform the function what its new coordinate name should be (in code, it would be flow_cap.groupby(node_grouping.rename("cap_node_groups")).sum(). Perhaps an attribute attached to node_grouping? E.g. "groups_on" or something.

BTW, default_if_empty isn't needed if you add the lookup parameter to the schema with a default value assigned to it (although you don't even need to do that with some kind of "groupby" function).

Also, although this is neater, it does go against the way we're currently setting up dimensions, as referenced by @jnnr in the original post. Ideally we use boolean arrays to map between different dimensions, not the dimension names in the array. This is partly a data storage issue - very large string arrays will bloat the model size in a way that boolean arrays won't. But it's also to reduce likelihood of errors (mispelling of a value in node_grouping will be difficult to catch until you've already send the model to be built, which is a time consuming part of the process where you want to limit errors occurring).

As nice as this is, flow_cap.groupby(node_grouping).sum() creates the new coordinate "node_grouping" when we need it to be "cap_node_groups". There needs to be some way to inform the function what its new coordinate name should be (in code, it would be flow_cap.groupby(node_grouping.rename("cap_node_groups")).sum(). Perhaps an attribute attached to node_grouping? E.g. "groups_on" or something.

Could be a way to go. Regarding the renaming of the grouped parameter: I think the name is correct already. The coordinate is node_grouping. In this example, node_grouping is used in a constraint on flow_cap, but it could be used to constrain other variables. So, I would say, rename the dim of the parameter flow_cap_node_groups to node_grouping instead.

Also, although this is neater, it does go against the way we're currently setting up dimensions, as referenced by @jnnr in the original post. Ideally we use boolean arrays to map between different dimensions, not the dimension names in the array. This is partly a data storage issue - very large string arrays will bloat the model size in a way that boolean arrays won't. But it's also to reduce likelihood of errors (mispelling of a value in node_grouping will be difficult to catch until you've already send the model to be built, which is a time consuming part of the process where you want to limit errors occurring).

I could probably live with the boolean way of defining the groups. May be less intuitive, but on the other hand more explicit. And if it helps avoid errors, it's good.

Regarding the renaming of the grouped parameter: I think the name is correct already. The coordinate is node_grouping.

Sadly not. The coordinate needs to be cap_node_groups as that is the dimension over which flow_cap_node_groups is indexed. When you use a grouper array to group values in another array what you get is a new dimension on the grouped array that has the name of the grouper array:

In [2]: import calliope

In [3]: m = calliope.examples.national_scale()  #  with grouping parameters applied as in an earlier comment on this thread.

In [4]: m.build() 

In [5]: m.backend.variables.flow_cap.groupby(m.inputs.node_grouping).sum(min_count=1)
Out[5]: 
<xarray.DataArray 'flow_cap' (node_grouping: 3, techs: 8, carriers: 1)> Size: 192B
array([[[nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e454350>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4557d0>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4548d0>],
        [<pyomo.core.expr.numeric_expr.LinearExpression object at 0x153fc3400>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4572d0>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e456350>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4563d0>]],

       [[nan],
        [nan],
        [<pyomo.core.expr.numeric_expr.LinearExpression object at 0x153fc0f70>],
        [nan],
        [nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e454750>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e455350>],
        [nan]],

       [[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4560d0>],
        [nan],
        [nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4574d0>],
        [nan],
        [nan],
        [nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e456a50>]]],
      dtype=object)
Coordinates:
  * techs          (techs) object 64B 'battery' 'ccgt' ... 'region1_to_region2'
  * carriers       (carriers) object 8B 'power'
  * node_grouping  (node_grouping) object 24B 'group_1' 'group_2' 'group_3'

^ in this example, we need the node_grouping coordinate to actually be cap_node_groups

OK, so two options that I would like some votes on @jnnr @tud-mchen6 @FLomb @irm-codebase. I'll post them as individual comments below this for you to add emojis to.

Both assume you have some node grouping that you want to apply a maximum to the sum of all technology capacities in those groups. Constraining values per group:

parameters:
  flow_cap_node_groups:
    data: [1, 2, 3]
    index: [group_1, group_2, group_3]
    dims: cap_node_groups

Option 1

Have a groupby helper function and define your grouping parameter as string items, e.g.:

Model definition

grouper as top-level parameter

parameters:
  node_grouping:
    data: [group_1, group_1, group_2, group_2, group_3]
    index: [region1, region1_1, region1_2, region1_3, region2]
    dims: [nodes]

grouper as node/tech-level parameter

nodes:
  region1:
    node_grouping: group_1
  ...

Math definition

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: "groupby(flow_cap, node_grouping, how=sum, rename_coord=cap_node_groups) <= flow_cap_node_groups"

The coordinate renaming could be somehow embedded in the parameter attributes before building the optimisation problem, so that it doesn't have to be defined explicitly in the math.

The LHS of the equation would be effectively applying the following python function:

m.backend.variables.flow_cap.groupby(m.inputs.node_grouping.rename("cap_node_groups")).sum(min_count=1)

Option 2

Have a where helper function inside math expressions that allows you to mask arrays using a boolean array on-the-fly.

Model definition

parameters:
  node_grouping:
    data: True
    index: [[group_1, region1], [group_1, region1_1], [group_2, region1_2], [group_2, region1_3], [group_3, region2]]
    dims: [cap_node_groups, nodes]

Math definition

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: "sum(where(flow_cap, node_grouping), over=nodes) <= flow_cap_node_groups"

The LHS of the equation would be effectively applying the following python function:

m.backend.variables.flow_cap.where(m.inputs.node_grouping_2.fillna(False).astype(bool)).sum("nodes", min_count=1)

Option 2 seems much easier to parse for a user. Possibly also cleaner to implement?

Thanks Bryn! I also prefer option 2. Some comments, though:

  • I think there is a bit too much repetition going on due to specifying cap_node_groups every time.
  • Setting this as a parameter is a bit confusing: these are subsets, in reality. We do not really define dims in the model config, so it's understandable, but still worth pointing out...

Would something like this be possible, somehow? Otherwise, feel free to ignore!

subdims:  # or subset?
  node_groupings:
    index:  # data[1, 2, 3] can be inferred from this!
        - group1: [region1, region1_1]   
        - group_2: [region1_2, region1_3]
        - group_3: [region2]
    dims: nodes   # tell it what to check for!
parameters:
  flow_cap_node_groups:
    data: [3000, 4000, 32332]
    index: [group_1, group_2, group_3]
    dims: node_groupings
constraints:
  flow_cap_per_node:
    foreach: [node_groupings, techs]
    where: "base_tech=supply"
    equations:
      - expression: "sum(where(flow_cap, node_groupings), over=nodes) <= flow_cap_node_groups"

Would something like this be possible, somehow?

I'm going to say that it wouldn't be feasible. It also doesn't really reduce the amount of work for the user. The main problem with subsets is that you can't store them in an xarray dataset as anything other than data variables, i.e. alongside all the other parameters. Therefore they're much more intuitively parameters within the context of calliope math components.