NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.

Home Page:https://nvidia.github.io/earth2mip/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ›[BUG][Feature Request]: Perturbing channels that are not included in `earth2mip/_channel_stds.py`

ankurmahesh opened this issue Β· comments

Version

source - main

On which installation method(s) does this occur?

Source

Describe the issue

When the channels are perturbed, they are multiplied by scale.

https://github.com/NVIDIA/earth2mip/blob/main/earth2mip/inference_ensemble.py#L246-L253

However, scale is determined from the values in this file, not from the scales stored in the model Inference

https://github.com/NVIDIA/earth2mip/blob/main/earth2mip/_channel_stds.py

I use a dataset that uses q, not r. Therefore, by the logic specified in code segment above, all of the q variable perturbations are set to 0 because q is not in the _channels_stds.py file. This wasn't my intended behavior: I intended to perturb q.

Three possible solutions:

  1. We delete the _channel_stds.py and instead use the model.scale in the perturbation. The perturb method has access to the model. Inference models have a scale attribute.
  2. If we keep the current logic, maybe we could add a logger warning that says "X channel is not perturbed". As more models get trained, I think it's likely that more variables get added (e.g. more vertical levels). Since the scales are being drawn from a separate file, not from the scales.npy file in the model package, I think an alert could be useful.
  3. I could just add the q scales to _channel_stds.py. But I would still recommend (2) above as well.

Environment details

I am running from source.  I currently see 86b11fe.
  1. We delete the _channel_stds.py and instead use the model.scale in the perturbation.

Not all models have .scale. It is not part of the TimeLoop interface defined here:

class TimeLoop(Protocol):
"""Abstract protocol that a custom time loop must follow

This is why we took steps to decouple the initialization from the model. There are also many potential ways to scale perturbations e.g. scale by climatological variance, and these should be applicable across models (even ones which don't have .scale).

Overall these perturbation methods could use an overhaul and refactor to a more modular design. e.g. one class per method. My solution would be to ask ChatGPT "please refactor this if-statement with many clauses to one class per if statement, this class should take a dictionary of channel scales as an argument to its constructor".

For now, (2) and (3) above would be nice.