OSGeo / PROJ-data

Repository for proj datum grids (for use by PROJ 7 or later)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add NGA EGM 2008 1x1 grid_alternative entry

msmitherdc opened this issue · comments

In order to account for the missing Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz file in the CDN, we need to add a grid_alternative entry to point to the 1' geotiff file.

('Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz','us_nga_egm2008_1.tif','egm08_1.gtx','GTiff','geoid_like',0,NULL,'https://cdn.proj.org/us_nga_egm2008_1.tif',1,1,NULL)

1' file can be found at https://grid-partner-share.s3.amazonaws.com/egm2008/us_nga_egm2008_1.tif

Hi @msmitherdc Who created / what was the creation process for that file ?

This file is 427 MB large. I don't think we want to include that directly in this github repo, nor in the PROJ-data tarball. We can upload it to cdn.proj.org, but should probably have a hint somewhere to keep track of such files that are CDN only. Probably a cdn_only_files.geojson stored in this repository that can be ingested by https://github.com/OSGeo/PROJ-data/blob/master/regenerate_index_html.py to include it in the generated https://github.com/OSGeo/PROJ-data/blob/master/files.geojson .
Otherwise people using projsync wouldn't get it.
And a mention of it in https://github.com/OSGeo/PROJ-data/blob/master/us_nga/us_nga_README.txt

CC @hobu @kbevers @jjimenezshaw

Looking at the metadata of the 1' file, I see:

  AREA_OR_POINT=Point
  target_crs_epsg_code=3855
  TIFFTAG_COPYRIGHT=Derived from work by USA NGA
  TIFFTAG_IMAGEDESCRIPTION=WGS 84 (EPSG::4979) to EGM2008 height (EPSG::3855)
  TYPE=VERTICAL_OFFSET_GEOGRAPHIC_TO_VERTICAL

compared to the existing us_nga_egm08_25.tif:

  area_of_use=World
  AREA_OR_POINT=Point
  target_crs_epsg_code=3855
  TIFFTAG_COPYRIGHT=Derived from work by NGA. Public Domain
  TIFFTAG_DATETIME=2019:12:27 00:00:00
  TIFFTAG_IMAGEDESCRIPTION=WGS 84 (EPSG:4979) to EGM2008 height (EPSG:3855). Converted from egm08_25.gtx (last modified at 2018/10/08)
  TYPE=VERTICAL_OFFSET_GEOGRAPHIC_TO_VERTICAL

So the optional "area_of_use=World" is missing. And the mention of the license (Public Domain?) in the TIFFTAG_COPYRIGHT too

I believe it was created from the .gtx file and yes, I just meant it to go into the cdn. Its referenced here https://earth-info.nga.mil/php/download.php?file=egm-08interpolation but not included. This is a file we've had for a while and not sure if it came direct from NGA or not. I can work with NGA to see if I can get something official

Hmm looks like the file came from https://s3-eu-west-1.amazonaws.com/download.agisoft.com/gtg/us_nga_egm2008_1.tif https://www.agisoft.com/downloads/geoids/ and it says Converted from [USA NGA](http://earth-info.nga.mil/) data under Public Domain license.

ok, we could potentially recreate the file from the data and hints at https://geographiclib.sourceforge.io/C++/doc/geoid.html#geoidinst since @cffk is a trusted source for that topic!

At first glance, agisoft file seems OK:

 gdallocationinfo -wgs84 us_nga_egm2008_1.tif 2 49
Report:
  Location: (10920P,2460L)
  Band 1:
    Value: 44.6771507263184

compared to GeographicLib PGM dataset which has a scale of 0.003 and offset of -108:

 gdallocationinfo -wgs84 geoids/egm2008-1.pgm 2 49
Report:
  Location: (120P,2460L)
  Band 1:
    Value: 50892

$ python -c "print 50892 * 0.003 - 108"
44.676000000000016

I’ve reached out to NGA to get any updates direct from them.

Procedure to generate a GeoTIFF file ranging from ~ -180 to ~ 180 deg longitude from the PGM file which ranges from ~ 0 to ~ 360:

gdal_translate geoids/egm2008-1.pgm west.vrt -srcwin 10800 0 10800 10801 -a_ullr -180.00833333333333333  90.008333333333333333  -0.0083333333333333  -90.008333333333333
gdal_translate geoids/egm2008-1.pgm east.vrt -srcwin 0 0 10800 10801
gdalbuildvrt out.vrt west.vrt east.vrt
gdal_translate out.vrt us_nga_egm08_1min.tif -co compress=deflate -co tiled=yes -co num_threads=12 -a_offset -108 -a_scale 0.003 -co predictor=2 -mo AREA_OR_POINT=Point -mo target_crs_epsg_code=3855 -a_srs EPSG:4979 # some missing metadata to add

echo 2 49 0 |  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_1min.tif +multiplier=1
  2.0000000000   49.0000000000      44.6760           inf

The resulting file is actually 78 MB large due to using integer storage !

The resulting file is actually 78 MB large due to using integer storage !

What?! the file us_nga_egm08_25.tif (with 2.5 min step) is 80 MB!

I remember that 3 years ago the file was available in the NGA webpage, and I downloaded it. Unfortunately I deleted it when I changed my laptop. I did compile the fortran code to produce an ascii file from the binary provided.

If the final output is really only 78 MB we could include it. I was expecting something more than 6 times the 2.5' version, that would be around 500 MB!
If it is finally ~500 MB, definitely we should do something special for it. (BTW, is it really needed at that 1 minute accuracy?)

If we can compress it that much, should we do the same to existing files? What are the drawbacks?

Having it in the list of alternatives but not in the standard package could be a problem with the option ONLY_BEST?

the file us_nga_egm08_25.tif (with 2.5 min step) is 80 MB!

yes, because if uses Float32 encoding, whereas the PGM file used for the 1' file uses scaled UInt16 data type. And the PREDICTOR=3 mode that applies to floating-point numbers is less efficient than PREDICTOR=2 on UInt16 data. The 2.5'' could also be reduced in a similar manner. The scaled encoding "only" allows a 0.003 m precision on the data, but the GeographicLib page mentions "However, the resulting quantization error (up to 1.5 mm) is typically smaller than the linear interpolation errors."

Having it in the list of alternatives but not in the standard package could be a problem with the option ONLY_BEST?

Well, if the 1' grid is listed in grid_alternatives but not available locally or PROJ_NETWORK is not enabled, then yes when enabling ONLY_BEST, it would logically complain

Here is the official file from NGA s3://grid-partner-share/egm2008/Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz
and a presentation about EGM s3://grid-partner-share/egm2008/NPavlis&al_EGU2008.ppt

And I'm sorry that the file is 800Mb.

Thanks @msmitherdc
How can I download that s3 url?

https://grid-partner-share.s3.amazonaws.com/egm2008/Und_min1x1_egm2008_isw%3D82_WGS84_TideFree.gz but there's indeed a permission issue. Anyway I don't think that file is directly usable as it must contain some "harmonics coefficients" or things like that, and that's GeographicLib that has done the heavy work of converting that to a grid

Permissions should be fixed

If I understood the format correctly, this script should convert to gtx (actually the data is pretty similar)

import struct

def main(src, dst):
    cols = 21600
    rows = 10801
    with open(dst, 'wb') as gtx, open(src, 'rb') as infile:
        gtx.write(struct.pack('>ddddii', -90, 0, 1.0/60, 1.0/60, rows, cols))

        rs = []
        for count in range(rows):
            infile.read(4)
            line = infile.read(cols*4)
            if not line:
                return 1
            rs.append(line)
            infile.read(4)

        for row in reversed(rs):
            gtx.write(row)

    return 0

if __name__ == "__main__":
    src = 'Und_min1x1_egm2008_isw=82_WGS84_TideFree'
    dst = 'egm2008.gtx'
    exit(main(src, dst))

The file does not contain the harmonics, but the geoid undulations directly. Before and after each row it has 4 bites with the integer value 86400.

If I understood the format correctly, this script should convert to gtx (actually the data is pretty similar)

cool! I assumed it wasn't a grid... and do you get the same values as https://grid-partner-share.s3.amazonaws.com/egm2008/us_nga_egm2008_1.tif / https://s3-eu-west-1.amazonaws.com/download.agisoft.com/gtg/us_nga_egm2008_1.tif ?

Those tif files that you mention @rouault have 21601 columns, while the original has 21600.

In the file README_WGS84_2.pdf from https://earth-info.nga.mil/php/download.php?file=egm-08interpolation it says:

(3) Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz

This file holds the 1'x1' geoid undulations in sequential, unformatted, binary format, with each geoid undulation stored as a single, REAL*4 value. Each record contains all of the geoid undulations for a single parallel band. Each geoid undulation is situated at the corner of its respective cell, such that the top-left value in the 1'x1' grid has a longitude of 0° East and a latitude of 90° North. The first record in the file contains the northern-most parallel, and the first value in each record is the western-most value for that parallel. Note that each 1'x1' value situated on the zero meridian appears only once, as the first value in its respective record, at a longitude of 0° East. These values are NOT repeated at the end of their respective records, at longitude of 360° East. As such, this file contains 10801 rows x 21600 columns of geoid undulations.

And some stats:

Global statistics (meters) for this file are:
Number of Values: 233301600
Percentage of Area: 100.000
Minimum Value: -106.910
Latitude of Minimum: 4.683
Longitude of Minimum: 78.767
Maximum Value: 85.840
Latitude of Maximum: -8.400
Longitude of Maximum: 147.367
Arithmetic Mean: -1.317
Area-Weighted Mean: -0.463
Arithmetic RMS: 29.274
Area-Weighted RMS: 30.596
Arithmetic S.Dev.: 29.244
Area-Weighted S.Dev.: 30.593

Apart from that, comparing with QGIS they have the same values (in the common columns).
Comparing with the file egm2008-1.pgm the maximum difference is 1.5 mm

(I'm always unsure about center or corner of the pixel. Also in the python script I did to convert to gtx. Anybody with more knowledge should check that, given the description I pasted above)

I'm always unsure about center or corner of the pixel.

This data is "pixel is point". I.e., the data give the heights at the corners of the cells (at lat + lon = integer multiples of 1' in this case).

Is there some agreement that the 1.5 mm max difference due to quantization if using a UInt16 data type + offset/scale is acceptable ?

For me 1.5 mm is fine. I asked a fried from the university and agrees. The model is worse than that.
Another question is... do we need the version with 1' when we already have it with 2.5'?

do we need the version with 1' when we already have it with 2.5'?

Taking a few "random" points, when doing bilinear interpolation, the difference between both versions is about within +/- 1.5 mm quantization error... It might be that I'm just out of luck, and it might be possible that in some areas, there is a significant difference... @msmitherdc Did you find that the 1' model was leading to more accurate results than the 2.5' one ?

($ echo 2.5 49.5 0 |  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_1min.tif +multiplier=1
  2.5000000000   49.5000000000       44.7390           inf
($ echo 2.5 49.5 0 | PROJ_NETWORK=ON  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_25.tif +multiplier=1
  2.5000000000   49.5000000000       44.7397           inf
($ echo -100.1 40.1 0 |  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_1min.tif +multiplier=1
-100.1000000000   40.1000000000      -24.7800           inf
($ echo -100.1 40.1 0 | PROJ_NETWORK=ON  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_25.tif +multiplier=1
-100.1000000000   40.1000000000      -24.7784           inf
($ echo -110.1 45.1 0 |  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_1min.tif +multiplier=1
-110.1000000000   45.1000000000       -7.5630           inf
($ echo -110.1 45.1 0 | PROJ_NETWORK=ON  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_25.tif +multiplier=1
-110.1000000000   45.1000000000       -7.5642           inf
($ echo 130.1 -50.1 0 |  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_1min.tif +multiplier=1
130.1000000000  -50.1000000000      -22.0530           inf
($ echo 130.1 -50.1 0 | PROJ_NETWORK=ON  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_25.tif +multiplier=1
130.1000000000  -50.1000000000      -22.0523           inf
$ echo 130.1 50.1 0 |  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_1min.tif +multiplier=1
130.1000000000   50.1000000000       14.9640           inf
$ echo 130.1 50.1 0 | PROJ_NETWORK=ON  ~/proj/proj/build_cmake/bin/cct +proj=vgridshift +grids=us_nga_egm08_25.tif +multiplier=1
130.1000000000   50.1000000000       14.9657           inf

actually, my issue was that reading a EGM08 vertical datum file via PDAL was returning this message

(pdal info readers.las Error) GDAL failure (1) PROJ: Cannot open https://cdn.proj.org/Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz: HTTP error 404: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz</Key><RequestId>JA240W92M02FW4GK</RequestId><HostId>DCOiZUnwWIQdxL8bK9wkprDQiq7kvM1q9ujL2qaCbinfCZ6fbcDN4C6glQQDTtKO7nE1OgYvUEo=</HostId></Error>
(pdal info readers.las Error) GDAL failure (1) PROJ: Cannot open https://cdn.proj.org/Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz: HTTP error 404: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz</Key><RequestId>JA240W92M02FW4GK</RequestId><HostId>DCOiZUnwWIQdxL8bK9wkprDQiq7kvM1q9ujL2qaCbinfCZ6fbcDN4C6glQQDTtKO7nE1OgYvUEo=</HostId></Error>

actually, my issue was that reading a EGM08 vertical datum file via PDAL was returning this message

ah, then it is something else... This error is a bit surprising as PROJ shouldn't try to fetch from the CDN a grid it doesn't know in grid_alternatives. I can't reproduce with:

echo 49 2 0 | PROJ_NETWORK=ON PROJ_DEBUG=ON bin/cs2cs EPSG:4326+3855 EPSG:4979

It tries to access a local Und_min1x1_egm2008_isw=82_WGS84_TideFree.gz file, but doesn't try to get it from the CDN

the geotiff keys on the file that triggers the lookup look like

Geotiff_Information:
   Version: 1
   Key_Revision: 1.0
   Tagged_Information:
      End_Of_Tags.
   Keyed_Information:
      GTModelTypeGeoKey (Short,1): ModelTypeProjected
      GTRasterTypeGeoKey (Short,1): RasterPixelIsPoint
      ProjectedCSTypeGeoKey (Short,1): PCS_WGS84_UTM_zone_18N
      ProjLinearUnitsGeoKey (Short,1): Linear_Meter
      VerticalCSTypeGeoKey (Short,1): Code-3855 (EGM2008 height)
      VerticalDatumGeoKey (Short,1): Code-1027 (EGM2008 geoid)
      VerticalUnitsGeoKey (Short,1): Linear_Meter
      Unknown-32777 (Ascii,6): "G1674"
      End_Of_Keys.
   End_Of_Geotiff.

the geotiff keys on the file that triggers the lookup look like

This should be reported as a PDAL issue with precise instructions. It might be a PROJ issue, or a GDAL issue or a PDAL issue, but that's unclear at that point.

My hunch is it is a libgeotiff issue, since we're just handing it all off to that in this case.