hypertidy / ncmeta

Tidy NetCDF metadata

Home Page:https://hypertidy.github.io/ncmeta/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Plans to support extended grid mapping attributes (as defined by CF-1.7 and later)?

dschlaep opened this issue · comments

I would like to ask if you would consider enhancing ncmeta so that it would support extended grid mapping attributes as defined by CF-1.7 and later?

CF conventions before 1.7 required the grid_mapping attribute of a variable to be an exact match with the variable name that provided the grid mapping: e.g,

int crs ;
  crs:grid_mapping_name = "latitude_longitude" ;
double variable ;
  variable:grid_mapping = "crs" ;

This is what package ncmeta supports with the current implementation of the function ncmeta:::nc_grid_mapping_atts.data.frame():

grid_mapping_atts <- dplyr::filter(x, variable %in% grid_mapping_vars$value)

Newer versions of CF, however, expanded the concept of grid mapping starting with CF-1.7 to handle cases with more than one crs:

The attribute takes a string value with two possible formats. In the first format, it is a single word, which names a grid mapping variable. In the second format, it is a blank-separated list of words "grid_mapping_variable: coordinate_variable [coordinate_variable ...] [grid_mapping_variable: ...]", which identifies one or more grid mapping variables, and with each grid mapping associates one or more coordinate_variables, i.e. coordinate variables or auxiliary coordinate variables.

This is illustrated by “Example 5.10. British National Grid”

float temp(z, y, x) ;
  temp:grid_mapping = "crsOSGB: x y crsWGS84: lat lon" ;
int crsOSGB ;
  crsOSGB:grid_mapping_name = "transverse_mercator";
int crsWGS84 ;
  crsWGS84:grid_mapping_name = "latitude_longitude";

Since CF-1.8 the extended format of the grid_mapping attribute may also provide
coordinate axis order, i.e.,

The order of the <coordinatesVariable> references within the grid_mapping attribute definition defines the order of elements within a derived coordinate value tuple. 

This is illustrated by “Example 5.11. Latitude and longitude on the WGS 1984 datum + CRS WKT”

float data(latitude, longitude) ;
  data:grid_mapping = "crs: latitude, longitude" ;
int crs ;
  crs:grid_mapping_name = "latitude_longitude";

Unfortunately, the package ncmeta does currently not recognize the crs of such files. Because of the exact matching used by function ncmeta:::nc_grid_mapping_atts.data.frame(), it currently cannot locate the corresponding crs of a grid mapping attribute in the extended format. It seems that the function could be enhanced to support the extended format if it would use partial matches or matches against the substring(s) before colon(s) of the grid mapping attribute.

I recognize that additional design decisions would need to be taken for situations with multiple mappings, e.g., maybe pick the first and warn about the others?

A small reproducible example:

Function to create netCDFs illustrating simple and extended grid_mapping

create_ncdf_with_grid_mapping <- function(file, extended_grid_mapping = FALSE) {
  nc <- RNetCDF::create.nc(file)

  # Create x-y coordinates
  RNetCDF::dim.def.nc(nc, "lon", 5)
  RNetCDF::dim.def.nc(nc, "lat", 3)

  # Create coordinate variables
  RNetCDF::var.def.nc(nc, "lon", "NC_DOUBLE", "lon")
  RNetCDF::var.def.nc(nc, "lat", "NC_DOUBLE", "lat")

  # Create variable and add grid_mapping attribute
  RNetCDF::var.def.nc(nc, "var", "NC_DOUBLE", c("lon", "lat"))
  tmp_gm <- if (extended_grid_mapping) "crs: lat lon" else "crs"
  RNetCDF::att.put.nc(nc, "var", "grid_mapping", "NC_CHAR", tmp_gm)

  # Create grid_mapping and add grid_mapping_name attribute
  RNetCDF::var.def.nc(nc, "crs", "NC_INT", NA)
  RNetCDF::att.put.nc(nc, "crs", "grid_mapping_name", "NC_CHAR", "latitude_longitude")

  RNetCDF::close.nc(nc)

  invisible(file)
}

Create example netCDFs

f1 <- tempfile("extended_grid_mapping_", fileext = ".nc")
f2 <- tempfile("simple_grid_mapping_", fileext = ".nc")

create_ncdf_with_grid_mapping(f1, extended_grid_mapping = TRUE)
create_ncdf_with_grid_mapping(f2)

Reading the extended grid_mapping produces a (non-fatal) error that results in no crs

x1 <- stars::read_ncdf(f1, var = "var")

## Error in UseMethod("GPFN") : 
##   no applicable method for 'GPFN' applied to an object of class "list"

sf::st_crs(x1)

## Coordinate Reference System: NA

Reading the simple grid_mapping works as expected and results in the correct crs

x2 <- stars::read_ncdf(f2, var = "var")

## Warning in getGeoDatum(gm): Didn't find a longitude of prime meridian for datum,
## assuming 0.

## Warning in getGeoDatum(gm): Didn't find a semi major axis for datum, assuming
## WGS84 6378137.0 meters

## Warning in getGeoDatum(gm): Didn't find an inverse flattening value, assuming
## WGS84 298.257223563

sf::st_crs(x2)

## Coordinate Reference System:
##   User input: +proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs 
##   wkt:
## GEOGCRS["unknown",
##     DATUM["unknown",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]],
##     PRIMEM["unknown",0,
##         ANGLEUNIT["degree",0.0174532925199433,
##             ID["EPSG",9122]]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]

Cleanup

unlink(c(f1, f2))

I used

ncmeta\_0.3.0, RNetCDF\_2.4-2, sf\_0.9-8, stars\_0.4-3
Linking to GEOS 3.9.1, GDAL 3.2.0, PROJ 7.2.1

Many thanks!

Looks like I missed the case where you might have a structured string listing multiple grid mappings. This should certainly be fixed. With what you've supplied here, I think there is enough to craft a good test for this behavior.

Question, would you be ok if ncmeta only returned the first or should it return all that it can find in the case that this extended format is used? It's been quite a while since I looked at this code -- so it's going to take a bit for me to reorient.

That would be great! My specific use case that lead me to this issue was a grid mapping specifying axis order (of a netCDF with just one grid mapping).

So, I would be ok if ncmeta only returned the first grid mapping of an extended format with multiple grid mappings. That said, it is not clear to me how the rest of ncmeta (and rev-dependencies, e.g., stars) would currently treat a netCDF with multiple grid mappings -- I didn't have time yet to test this out, e.g., based on “Example 5.10. British National Grid”; maybe such netCDFs currently don't work for multiple reasons?

However, one idea in an attempt to future-proof the function ncmeta:::nc_grid_mapping_atts.data.frame() would be to add a new argument, e.g., function (x, data_variable = NULL, grid_mapping = NULL) that would allow a user to specify one of the available grid mappings, with a default picking the first, e.g.,

nc_grid_mapping_atts.data.frame <- function(x, data_variable = NULL, grid_mapping = NULL) {
  ...
  grid_mapping_vars <- dplyr::filter(...
  if (is.null(grid_mapping)) {
    grid_mapping <- find_first_grid_mapping(grid_mapping_vars$value)
  }
  grid_mapping_atts <- dplyr::filter(x, variable %in% grid_mapping)
  if (nrow(grid_mapping_atts) == 0) {
    warning(paste("no grid_mapping attribute with name", shQuote(grid_mapping), "found"))
    return(tibble::tibble())
  }
  grid_mapping_atts <- dplyr::left_join(...
  ...
}

and

find_first_grid_mapping <- function(grid_mapping_str) {
  strsplit(grid_mapping_str, split = ":", fixed = TRUE)[[1]][1]
}

with

library("testthat")

expect_equal(find_first_grid_mapping("crs"), "crs")

# CF-1.7: "Example 5.10. British National Grid"
expect_equal(
  find_first_grid_mapping("crsOSGB: x y crsWGS84: lat lon"), 
  "crsOSGB"
)

# CF-1.8: "Example 5.11. Latitude and longitude on the WGS 1984 datum + CRS WKT"
expect_equal(find_first_grid_mapping("crs: latitude, longitude"), "crs")