DFO-CHS-Dynamic-Hydrographic-Products / IWLS_pygeoapi

pygeoapi plugins to access and process water level and surface currents from the IWLS public API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attribute Metadata bug fixes

princessmittens opened this issue · comments

As discussed previously, the S104/S111 should have similar group metadata outlined below by @glmcr

In addition to this there are redundant attributes:
northBoundLongitude- to be removed
southBoundLongitude- to be removed
numInstance - redundant (should be numInstances but missing an 's' so a new attribute was created, likely a typo)

Use the same structure as the S104Dcf8:
---------------------------------------

    If one HADCP as more than one measurement point ("bin") associated with it then
    it can be considered as a "station" and should have its own "Group_nnn" structure.

    e.g. Woodward's Landing would have 3 bins (three measurements points with different coordinates
    in double precision in order that they can be displayed as two distinct points in a GUI
    viz app. in full zoom)

    Note that the "Hydrodynamic model forecast" current type would be interpolated
    at the stations "bins" points coordinates (when it is possible to do so)

    ...


    ATTRIBUTE "numInstances" {
        DATATYPE  H5T_STD_I32LE
        DATASPACE  SCALAR
        DATA {
        (0): 2
        }
    }
    ...

    GROUP "SurfaceCurrent.01" {
       ...

       ATTRIBUTE "typeOfCurrentData" {

           ...
           DATASPACE  SCALAR
           DATA {
           (0): Real-time observation
          }
       }
       ...

       ATTRIBUTE "numberOfStations" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 3
            }
       }
       ...

       GROUP Group_001 {
           ...

           ATTRIBUTE "stationName" {
              ...

              DATASPACE  SCALAR
              DATA {
               (0): "Woodward's Landing: bin.001"
              }       
           }
           ...
        }
        GROUP Group_002 {
            ...

            ATTRIBUTE "stationName" {
                ...

               DATASPACE  SCALAR
               DATA {
               (0): "Woodward's Landing: bin.002"
               }
            }
            ...
        }
        GROUP Group_003 {
            ...

            ATTRIBUTE "stationName" {
                ...

               DATASPACE  SCALAR
               DATA {
               (0): "Woodward's Landing: bin.003"
               }
            }
            ...
        }
        GROUP "Positioning" {
          DATASET "geometryValues" {
             DATATYPE  H5T_COMPOUND {
                H5T_IEEE_F64LE "latitude";
                H5T_IEEE_F64LE "longitude";
             }
             DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
          ...         
        }
    } 
    GROUP "SurfaceCurrent.02" {
       ...

       ATTRIBUTE "typeOfCurrentData" {

           ...
           DATASPACE  SCALAR
           DATA {
           (0): Hydrodynamic model forecast
          }
       }
       ...

       ATTRIBUTE "numberOfStations" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 3
            }
       }
       ...

       GROUP Group_001 {
           ...

           ATTRIBUTE "stationName" {
              ...

              DATASPACE  SCALAR
              DATA {
               (0): "Woodward's Landing: bin.001"
              }
           }
           ...
        }
        GROUP Group_002 {
            ...

            ATTRIBUTE "stationName" {
                ...

               DATASPACE  SCALAR
               DATA {
               (0): "Woodward's Landing: bin.002"
               }
            }
            ...
        }
        GROUP Group_003 {
            ...

            ATTRIBUTE "stationName" {
                ...

               DATASPACE  SCALAR
               DATA {
               (0): "Woodward's Landing: bin.003"
               }
            }
            ...
        }
        GROUP "Positioning" {
          DATASET "geometryValues" {
             DATATYPE  H5T_COMPOUND {
                H5T_IEEE_F64LE "latitude";
                H5T_IEEE_F64LE "longitude";
             }
             DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
          ...         
       }
    } 
commented

The "numInstances" attribute must be located in the "/SurfaceCurrent" GROUP.

Yes, but there are two right now:
numInstance and numInstances, should only be one which is numInstances

Additionally:Group_F/featureCode in S111 has the value 'WaterLevel' > should be SurfaceCurrent

@glmcr and @maximecarre1

Small detail but I was wondering what you both thought would be the standard: In Max's Group_F datasets, the strings in the featureCode and SurfaceCurrent/WaterLevel groups are encoded and have to be decoded. In the dhp_enav (our version) and it's not encoded). I took a quick look in the spec and didn't see anything related to this, aside from it being a dataset with the assigned types lists in it.

For example in Max's implementation it's:

[(b'surfaceCurrentSpeed', b'Surface current speed', b'knots', b'-1.0', b'H5T_FLOAT', b'0.0', b'[]', b'geSemiInterval'), (b'surfaceCurrentDirection', b'Surface current direction', b'arc-degrees', b'-1.0', b'H5T_FLOAT', b'0.0', b'360', b'geLtInterval'), (b'surfaceCurrentTime', b'Surface current time', b'DateTime', b'', b'H5T_STRING', b'19000101T000000Z', b'21500101T000000Z', b'closedInterval')]

The dhp_enav version does not have the byte encoding

commented

Good point @princessmittens. Well I would say that we need to be consistent for our CHS products so we would not use the byte encoding. On the other hand, I am not completely sure but I think that it does make any difference for the GUI viz apps.

commented

Also, is this byte encoding thing is mainly for web pages displaying (i.e. unicode stuff) purposes? So if it is the case, since what we do is not related at all to web pages stuff then no need for this byte encoding for the strings in our products.

commented

But also, I do not understand why the Group_F content is defined in the code since it something that is static in the file templates. We have no such code related to the Group_F on the gpsc.science.gc.ca side.

commented

I know that at some point we will probably use variable time intervals for DCF8 so the Group_F would be different than for the constant time intervals type (as we have now) but we could simply have to use specific DCF8 file templates depending on the type of time intervals used.

commented

On the other hand, there is nothing that forbid us to use the variable time interval type even for constant time intervals since it is only 1D time series so the somewhat larger file size should not really be an issue nowadays. We would have just two DCF8 file templates to manage and there would also no need for the use of missing data flags at all for observations (and it would probably also be easier to compute WLs trends and min,max values for both WLs and currents).

The string encoding in the metadata is an holdover from when we where using a previous version of h5py where variable length strings where only created from python byte strings. In the most recent version this is default behavior. See:

https://docs.h5py.org/en/stable/strings.html
vs
https://docs.h5py.org/en/2.8.0/strings.html

We can safely change them if we want to stay consistent.

This is just for the conversion done by h5py, all the strings in the files themselves are encoded in utf-8

commented

Ok thanks for the info Maxime. So if the hardcoded content of the Group_F is really used then we can drop the leading 'b' characters for it and also everywhere else in the code.

But the question is more about the existence of this hardcoded Group_F in the code, is it still relevant considering that it can already be "hardcoded" in the file templates themselves? (Whatever if we decide to have file templates for both constant time intervals and variable time intervalls or not)

commented

I will create an issue about experimenting with the variable time intervals for DCF8 products and assign it to myself.

Group_F is only in the template, not the code; it never needs to be updated. With the latest version of h5py regular strings are encoded correctly regardless if they are passed as byte strings or not but Group_F is a compound of multiple strings that requires a custom numpy datatype, I think that might be the issue there.

I'll look into it and update the template if needed.

commented

Ok good. So for now, since we are only using the constant time intervals DCF8 type then no need for two other DCF8 file templates for variable time intervals for now and we will see later with the experiments done for #13 if we will choose to use only the variable time intervals type (even with data having constant time intervals) or not.

I forgot about this but I ran into this issue a while ago where it was encoding it into byte strings as the default. I think at the time, we were not using byte strings so I had to cast it as np.string(...) everytime. Really strange.

commented

This string encoding thing seems not to be an issue anymore now with the newer h5py versions.

One more thing:

S111: the axisNames attribute is missing under the SurfaceCurrent group, S104 has it though so just need to alter the S111 template. Note that the S104 axisNames are encoded strings.

commented

Okay thanks, good catch for the missing axisNames attr. in the S111 DCF8 file template.

What is the h5py version you are using exactly?

I've tested creating custom numpy data types for HDF5 compounds with regular strings instead of byte strings in the current version of h5py; It doesn't looks like it's causing any issues. I will update Group_F in the template.