hypertidy / ncmeta

Tidy NetCDF metadata

Home Page:https://hypertidy.github.io/ncmeta/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nc_coord_var divination

mdsumner opened this issue · comments

@dblodgett-usgs

I have a question mark as I'm not sure if it's the reason, but I see a problem in the divining of XYZT at these lines

ncmeta/R/nc_coord.R

Lines 97 to 102 in ce8ca20

coord_var <- sapply(coord_vars, divine_XYZT,
atts = filter(att, variable %in% coord_vars),
simplify = FALSE)
coord_var_base <- tibble::as_tibble(list(coord_var = names(coord_var),
axis = unlist(coord_var)))

 list(coord_var = names(coord_var),
+      axis = unlist(coord_var))
$coord_var
[1] "time" "zlev" "lat"  "lon" 

$axis
time  lat  lon 
 "T"  "Y"  "X" 

zlev is missing

I've attached a zipped file that shows the problem:

 nc_coord_var("avhrr-only-v2.19810901.nc")
 Error: All columns in a tibble must be 1d or 2d objects:
* Column `axis` is NULL

avhrr-only-v2.19810901.nc.zip

Note that more recent files from OISST v2 don't exhibit this problem, presumably because they have a more standard form - these files exist in two versions, and reduced.nc was derived from this one - but must have become more sensible when it was created.

Ah, it's the divining logic:

when file is "www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/access/avhrr-only/198109/avhrr-only-v2.19810901.nc"

$long_name
[1] "Sea surface height"

$units
[1] "meters"

$actual_range
[1] "0, 0"

when file is "www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/access/avhrr-only/201901/avhrr-only-v2.20190106_preliminary.nc"

$long_name
[1] "Sea surface height"

$units
[1] "meters"

$positive
[1] "down"

$actual_range
[1] "0, 0"

That's scary stuff, like "well we have x, y, t - and this other thing, maybe it's z ? Especially since this is both degenerate and non-unlimited and actually never used in practice - perhaps that's the logic? I'm length 1, I'm not unlimited - should we

  • forget it, effectively drop to (n -1) dims?
  • assume it is the remaining one of 4?

I feel that the former is the right way.

dumps:

dimensions:
        time = 1 ;
        zlev = 1 ;
        lat = 720 ;
        lon = 1440 ;
variables:
        float time(time) ;
                time:long_name = "Center time of the day" ;
                time:units = "days since 1978-01-01 00:00:00" ;
        float zlev(zlev) ;
                zlev:long_name = "Sea surface height" ;
                zlev:units = "meters" ;
                zlev:actual_range = "0, 0" ;
        float lat(lat) ;
                lat:long_name = "Latitude" ;
                lat:units = "degrees_north" ;
                lat:grids = "Uniform grid from -89.875 to 89.875 by 0.25" ;
        float lon(lon) ;
                lon:long_name = "Longitude" ;
                lon:units = "degrees_east" ;
                lon:grids = "Uniform grid from 0.125 to 359.875 by 0.25" ;
        short sst(lon, lat, zlev, time) ;
                sst:long_name = "Daily sea surface temperature" ;
                sst:units = "degrees C" ;
                sst:_FillValue = -999 ;
                sst:add_offset = 0 ;
                sst:scale_factor = 0.01 ;
                sst:valid_min = -300 ;
                sst:valid_max = 4500 ;
        short anom(lon, lat, zlev, time) ;
                anom:long_name = "Daily sea surface temperature anomalies" ;
                anom:units = "degrees C" ;
                anom:_FillValue = -999 ;
                anom:add_offset = 0 ;
                anom:scale_factor = 0.01 ;
                anom:valid_min = -1200 ;
                anom:valid_max = 1200 ;
        short err(lon, lat, zlev, time) ;
                err:long_name = "Estimated error standard deviation of analysed_sst" ;
                err:units = "degrees C" ;
                err:_FillValue = -999 ;
                err:add_offset = 0 ;
                err:scale_factor = 0.01 ;
                err:valid_min = 0 ;
                err:valid_max = 1000 ;
        short ice(lon, lat, zlev, time) ;
                ice:long_name = "Sea ice concentration" ;
                ice:units = "percentage" ;
                ice:_FillValue = -999 ;
                ice:add_offset = 0 ;
                ice:scale_factor = 0.01 ;
                ice:valid_min = 0 ;
                ice:valid_max = 100 ;

// global attributes:
                :Conventions = "CF-1.0" ;
                :title = "Daily-OI-V2, final, Data (Ship, Buoy, AVHRR, GSFC-ice)" ;
                :History = "Version 2.0" ;
                :creation_date = "2011-05-04" ;
                :Source = "NOAA/National Climatic Data Center" ;
                :Contact = "Dick Reynolds, email: Richard.W.Reynolds@noaa.gov & Chunying Liu, email: Chunying.liu@noaa.gov" ;
dimensions:
        time = UNLIMITED ; // (1 currently)
        zlev = 1 ;
        lat = 720 ;
        lon = 1440 ;
variables:
        float time(time) ;
                time:long_name = "Center time of the day" ;
                time:units = "days since 1978-01-01 00:00:00" ;
        float zlev(zlev) ;
                zlev:long_name = "Sea surface height" ;
                zlev:units = "meters" ;
                zlev:positive = "down" ;
                zlev:actual_range = "0, 0" ;
        float lat(lat) ;
                lat:long_name = "Latitude" ;
                lat:units = "degrees_north" ;
                lat:grids = "Uniform grid from -89.875 to 89.875 by 0.25" ;
        float lon(lon) ;
                lon:long_name = "Longitude" ;
                lon:units = "degrees_east" ;
                lon:grids = "Uniform grid from 0.125 to 359.875 by 0.25" ;
        short sst(lon, lat, zlev, time) ;
                sst:long_name = "Daily sea surface temperature" ;
                sst:units = "Celsius" ;
                sst:_FillValue = -999 ;
                sst:add_offset = 0 ;
                sst:scale_factor = 0.01 ;
                sst:valid_min = -300 ;
                sst:valid_max = 4500 ;
        short anom(lon, lat, zlev, time) ;
                anom:long_name = "Daily sea surface temperature anomalies" ;
                anom:units = "Celsius" ;
                anom:_FillValue = -999 ;
                anom:add_offset = 0 ;
                anom:scale_factor = 0.01 ;
                anom:valid_min = -1200 ;
                anom:valid_max = 1200 ;
        short err(lon, lat, zlev, time) ;
                err:long_name = "Estimated error standard deviation of analysed_sst" ;
                err:units = "Celsius" ;
                err:_FillValue = -999 ;
                err:add_offset = 0 ;
                err:scale_factor = 0.01 ;
                err:valid_min = 0 ;
                err:valid_max = 1000 ;
        short ice(lon, lat, zlev, time) ;
                ice:long_name = "Sea ice concentration" ;
                ice:units = "%" ;
                ice:_FillValue = -999 ;
                ice:add_offset = 0 ;
                ice:scale_factor = 0.01 ;
                ice:valid_min = 0 ;
                ice:valid_max = 100 ;

// global attributes:
                :Conventions = "CF-1.6" ;
                :title = "NCEI Daily-OISST-V2 based mainly on AVHRR, Interim" ;
                :history = "Version 2.0" ;
                :creation_date = "2019-01-07 06:36" ;
                :description = "Reynolds, et al.(2007) Daily High-resolution Blended Analyses. Available at http://journals.ametsoc.org/doi/abs/10.1175/2007JCLI1824.1. Climatology is based on 1971-2000 OI.v2 SST, Satellite data: Navy NOAA-19 METOP-A AVHRR, Ice data: NCEP ice" ;
                :source = "NOAA/National Centers for Environmental Information" ;
                :contact = "oisst-help, email: oisst-help@noaa.gov" ;

Stared at this for a little while and... there's no way to positively identify zlev as a Z coordinate. The only thing you can really go on is the "positive" or "axis" attributes which aren't present.

So, I agree the right thing to go here is just drop it. This is basically saying that we can't figure out what that coordinate variable is without somewhat sketchy inference. I'll get a PR in quick unless you already see how to drop it?

On second thought, given that there are variables here with four dimensions and we DO have four coordinate variables, I could see an argument to make the inference and fill in the missing Z coordinate. I'll fiddle with it and get you a PR sometime tomorrow. -- late here.

Thanks! I'm definitely out of my depth here, very happy to take your PRs. I still have some concerns, since we might have 4 dimensions with 4 coordinate variables and have nothing to do with any real world concept of X, Y, Z, T - and I'm a little confused about why such a strong canonical form of them needs to exist. But I'll just keep testing and follow your lead ;)

Oh, just a note - it's new now that ncmeta is sort of disruptive within stars, but the nice thing is that there's no internal inconsistency within ncmeta itself - because all it does is provide functions that return these entities, and that's the way it should be. If some source fails to make sense for one of these getters, then that's an empty slot or a missing feature - there's no reason that should be a problem in a downstream package. stars is "meaning-agnostic" in some important ways (not all), and so if XY[[Z]T] is formally definable by these hueristics then so good, but otherwise stars can fall back to whatever is provided by name and order.

I don't have strong opinions on interpretation, but I do believe we should be able to read data always, and accept or override interpretation as needed.

Agreed @mdsumner. I think the need here is around automation -- tools like Panoply (NetCDF-Java) only work because they've implemented numerous metadata constructs that allow a common data model to understand how to map various datasets concepts of spatio-temporal dimensions. Sorry my work here has been a bit clumsy. Hopefully this is helpful in the end. I'll spend some more time this morning and hopefully we can find a good balance of specificity and flexibility.