NVIDIA / earth2mip

Version

source-main

On which installation method(s) does this occur?

Source, Pip

Describe the issue

I tried running graphcast_operational to obtain deterministic scores using inference_medium_range. However, I observed that there is a shape mismatch between input and output variables of the model as there is an additional variable "total precipitation 6 hour" (tp06) in the output channels of the model. This mismatch results into error in running score_deterministic module for graphcast_operational.

Code example

import datetime
from earth2mip.networks import get_model
from earth2mip.initial_conditions import cds
from earth2mip.inference_ensemble import run_basic_inference
from earth2mip.inference_medium_range import score_deterministic
import numpy as np
time_loop = get_model("e2mip://graphcast_operational", device="cuda:0")
data_source = cds.DataSource(time_loop.in_channel_names)
#ds = run_basic_inference(time_loop, n=5, data_source=data_source, time=datetime.datetime(2018, 1, 1))
scores = score_deterministic(time_loop,data_source=data_source,n=3,initial_times=[datetime.datetime(2018, 1, 1)],time_mean=np.zeros((len(time_loop.in_channel_names), 721, 1440)))
print(scores)

Error

Traceback (most recent call last): File "/workspace/graphcast_test.py", line 14, in scores = score_deterministic(time_loop,data_source=data_source,n=3,initial_times=[datetime.datetime(2018, 1, 1)],time_mean=np.zeros((len(time_loop.in_channel_names), 721, 1440))) File "/usr/local/lib/python3.10/dist-packages/earth2mip/inference_medium_range.py", line 219, in score_deterministic save_scores( File "/usr/local/lib/python3.10/dist-packages/earth2mip/inference_medium_range.py", line 289, in save_scores run_forecast( File "/usr/local/lib/python3.10/dist-packages/earth2mip/inference_medium_range.py", line 122, in run_forecast channels = [ File "/usr/local/lib/python3.10/dist-packages/earth2mip/inference_medium_range.py", line 123, in data_source.channel_names.index(name) for name in model.out_channel_names ValueError: 'tp06' is not in list

Environment details

OS platform and distribution: Ubuntu 22.04.3 LTS (x86x64)
PyTorch version: 2.1.0a0+32f93b1
Python version:3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit runtime)
CUDA/cuDNN version:12.2.140
GPU models and configuration:NVIDIA H100 80GB HBM3

Modulus version:23.11
How Modulus is used (Docker image/ bare-metal installation):Docker from nvcr.io
(If using Modulus Docker image): Exact docker-run command: I am running on Eos with the --container-image argument to provide details of Nvidia's modulus docker.

I believe this is fixed by #154.

I believe so, seems to be solving the issue. The original source is from get_initial_condition_for_model always using in_channels in the original.

Maybe @pgarg7 you can look at the method implemented and add this?

earth2mip/earth2mip/inference_medium_range.py

Line 151 in 6696517

verification_torch = initial_conditions.get_data_from_source(

earth2mip/earth2mip/initial_conditions/__init__.py

Line 58 in 6696517

def get_data_from_source(

I would personally be for switching out the three paramters:

time: datetime.datetime,
time_levels: int,
time_step: datetime.timedelta = datetime.timedelta(hours=0),

for just one:

times: Union[datetime.datetime, list[datetime.datetime]]

I would personally be for switching out the three paramters:

time: datetime.datetime,
time_levels: int,
time_step: datetime.timedelta = datetime.timedelta(hours=0),

for just one:

times: Union[datetime.datetime, list[datetime.datetime]]

Not opposed, but can we avoid the union type? It add some convenience for interactive use, but also some ambiguity. For example, what would the shape of the output be in the datetime.datetime case? one could argue it shouldn't be (B, C, ...) rather than (B, history, C, ...).

I agree! I can give a shot to implement the method mentioned above and see if it improves the generalizability of the scoring module.

Not opposed, but can we avoid the union type?

I'm for it, the union was there to just allow some slight convenience for a single time. But just requiring list[datatime] I think is fine, also makes the API more clear.