Unable to fit the netCDF4 file data using pyXpcm

Question

Unable to fit the netCDF4 file data using pyXpcm

Priyanshu-Malik opened this issue 2 years ago · comments

Priyanshu-Malik commented 2 years ago

Don't know if this is how the netCDF4 (sst2010.nc) file is read as can be seen here while dealing with pyXpcm.

the
"m.fit(ds, features=features_in_ds, dim=features_zdim)
m"
code gives the following error :

xarray.DataArray vertical axis is not deep enough for this PCM axis [0.51 > 0.00]

Can you take a look and tell me what I am missing here.

Attaching the code here :

Guillaume Maze · Answer 1 · Tue Jul 05 2022 21:47:50 GMT+0800 (China Standard Time)

Hi @Priyanshu-Malik
the vertical axis must be negative and oriented downward
so, in your case you could:

ds['deptht'] = -np.abs(ds['deptht'])

then, to avoid interpolation (because this is a large array and it will take some time), you can use the dataset vertical axis as a PCM axis:

z = ds['deptht'].values[0:40]
pcm_features = {'temperature': z}

Here I cut the axis to the first 500m of the water column

and then you can:

features_in_ds = {'temperature': 'votemper'}
m.fit_predict(ds, features=features_in_ds, inplace=True)

on my laptop, this took about 37 minutes to run with a PCM of K=12 classes:

ds['PCM_LABELS'].isel(time_counter=0).plot(x='x')

Priyanshu-Malik · Answer 2 · Wed Jul 06 2022 03:03:37 GMT+0800 (China Standard Time)

Worked like a charm, Can't thank you enough and pyxpcm. Took half an hour to run in my laptop as well.
I was able to follow the rest of the tutorial with ease from that point onward. Got all plots and graphs

Though, the 'votemper' and 'PCM_LABELS' gave all values in 'nan nan'(attached below) after running the "m.fit_predict(..)" code which might have something to do with the dataset itself and not an error, really want to hear about it from you.

One of the plot :

Guillaume Maze · Answer 3 · Wed Jul 06 2022 14:53:33 GMT+0800 (China Standard Time)

great !
these nans are just a sample of the large array
with this I think we can close this issue then

Priyanshu-Malik · Answer 4 · Fri Jul 15 2022 15:33:11 GMT+0800 (China Standard Time)

Hello Sir, been working on the same dataset for a week now, I have to do it for 5 years, so I downloaded the three year data first, making the total size of array about 5 GB. The m.fit_predict(ds, features=features_in_ds, inplace=True) command now takes forever to work, along with a warning that says, Slicing is producing a large chunk

If possible, can you answer these important queries ?

Is there a workaround for implementing the model for large size of data ? If no, just answer the 2nd question which is more important for me.
How to determine the optimum number of classes (k value), because the tutorial don't have any way to find the value of
k using BIC elbow method. We just took random value, like k=12. Can you please tell me how to find the k value suitable for my dataset using pyXpcm?