Masking not working properly with _Unsigned

Question

Masking not working properly with _Unsigned

dopplershift opened this issue 6 years ago · comments

I have a Level 2 QPE product from GOES-16 that caused some support issues. The relevant CDL is:

netcdf satellite/goes16/GOES16/Products/RainRateQPE/FullDisk/current/OR_ABI-L2-RRQPEF-M3_G16_s20181072300427_e20181072311194_c20181072311280.nc {
  dimensions:
    y = 5424;
    x = 5424;
    number_of_time_bounds = 2;
    band = 1;
    number_of_image_bounds = 2;
    number_of_sunglint_angle_bounds = 2;
    number_of_LZA_bounds = 2;
    number_of_SZA_bounds = 2;
    number_of_lat_bounds = 2;
    number_of_rainfall_rate_bounds = 2;
  variables:
    short RRQPE(y=5424, x=5424);
      :_FillValue = -1S; // short
      :long_name = "ABI L2+ Rainfall Rate - Quantitative Prediction Estimate";
      :standard_name = "rainfall_rate";
      :_Unsigned = "true";
      :valid_range = 0S, -6S; // short
      :scale_factor = 0.00152602f; // float
      :add_offset = 0.0f; // float
      :units = "mm h-1";
      :resolution = "y: 0.000056 rad x: 0.000056 rad";
      :coordinates = "latitude retrieval_local_zenith_angle quantitative_local_zenith_angle solar_zenith_angle t y x";
      :grid_mapping = "goes_imager_projection";
      :cell_methods = "latitude: point (good quality pixel produced) retrieval_local_zenith_angle: point (good or degraded quality pixel produced) quantitative_local_zenith_angle: sum (good quality pixel produced) solar_zenith_angle: sum (good quality pixel produced) t: point area: point";
      :ancillary_variables = "DQF";

Note the values in valid_range; the values themselves are appropriate for a signed data type, but they only make sense as a range if you convert signed (-6) to unsigned (65530). The values in valid_range are not incorrect though, as the standards specify that the values need to be the same type as the variable.

The current out of the box behavior is that netCDF4-python returns an entirely masked variable. The work-around is to disable masking.

The correct behavior IMO is to have valid_range and friends be handled like the data values for unsigned purposes.

I've included the sample file.

Jeff Whitaker · Answer 1 · Thu Apr 19 2018 15:18:18 GMT+0800 (China Standard Time)

I'm traveling so I won't be able to look at this till next week. Have you tried the latest master?

Jeff Whitaker · Answer 2 · Thu Apr 19 2018 16:18:27 GMT+0800 (China Standard Time)

The valid range is assumed to be of the same type as the netcdf variable (signed short integer) and the conversion to unsigned short is considered to be part of the scale/offset operation (a numpy view is created after the mask is created).

Ryan May · Answer 3 · Fri Apr 20 2018 05:54:20 GMT+0800 (China Standard Time)

In this case, valid_range is the same type as the variable. The problem is that valid_range is given as: (0, -6). These are the same (and correct) bit pattern regardless of signed/unsigned. The problem is that for the original signed data, masking values < 0 and >-6 produces useless results, whereas doing the same operation for the unsigned data, masking <0 and > 66530, produces the desired results.

Jeff Whitaker · Answer 4 · Wed Apr 25 2018 07:06:32 GMT+0800 (China Standard Time)

Yes, but isn't the valid_range (also missing_value, _FillValue) supposed to apply to the native variable data, which in this case is signed?

We are currently treating the _Unsigned attribute as part of the scaling operation, after the masking is applied.

Ryan May · Answer 5 · Thu Apr 26 2018 11:09:19 GMT+0800 (China Standard Time)

Hmmm...I just found this in the netCDF User's Guide under Best Practices:

If the variable is unsigned the valid_range values should be widened if needed and stored as unsigned integers.

@lesserwhirls Does netCDF-java handle valid_range? If so, what does it do with _Unsigned combined with valid_range?

Sean Arms · Answer 6 · Thu Apr 26 2018 22:15:41 GMT+0800 (China Standard Time)

@dopplershift - yes, netCDF-java tries to deal with valid_range. I'm not sure of the details, as the code has changed between 4.6.x and 5.0. @cwardgar was in that code recently to deal with _FillValue, so he may have the best understanding at this point.

Christian W · Answer 7 · Fri Apr 27 2018 02:53:50 GMT+0800 (China Standard Time)

Does netCDF-java handle valid_range? If so, what does it do with _Unsigned combined with valid_range?

Yes it does. First, it widens valid_range to the next largest integral type. This allows a bit pattern which previously may have been interpreted as negative (because e.g. we're storing an unsigned short in a short) to be properly interpreted as a non-negative number.

Then, it applies scale and offset. The result will be a double. For the dataset you provided, NJ calculates valid_min == 0 and valid_max == 100.00009070616215. That seems correct, yeah?

Jeff Whitaker · Answer 8 · Fri Apr 27 2018 03:05:16 GMT+0800 (China Standard Time)

Does netCDF-java do the same with _FillValue and missing_value? (cast to the larger integral type)

Christian W · Answer 9 · Fri Apr 27 2018 03:48:47 GMT+0800 (China Standard Time)

@jswhit Yes, missing_value is widened first. _FillValue is not! That's likely a bug. Thanks for pointing that out.

And to be clear, valid_* and missing_value are widened before scale/offset are applied, not merely cast. For example:

        short s = -6;
        System.out.println((int) s);     // Cast: -6
        System.out.println(s & 0xffff);  // Widen: 65530

The problem that I see with _FillValue is that it is being cast (to double) before scale/offset right now, not widened.

Jeff Whitaker · Answer 10 · Fri Apr 27 2018 23:34:31 GMT+0800 (China Standard Time)

With the changes in pull request #797, the following script

from netCDF4 import Dataset
import matplotlib.pyplot as plt
nc=Dataset('OR_ABI-L2-RRQPEF-M3_G16_s20181072300427_e20181072311194_c20181072311280.nc')
data = nc['RRQPE'][:]
print data.dtype, data.min(), data.max()
plt.imshow(data,cmap=plt.cm.jet,vmin=0,vmax=100)
plt.colorbar()
plt.show()

produces

float32 0.0 100.00009

and the attached png file.

Can someone try this with netcdf-java and see if they get the same?

Sean Arms · Answer 11 · Sat Apr 28 2018 06:11:02 GMT+0800 (China Standard Time)

That's what I get using toolsUI.

Jeff Whitaker · Answer 12 · Tue May 01 2018 23:50:51 GMT+0800 (China Standard Time)

pull request #797 merged

Deleted user · Answer 13 · Thu Jul 19 2018 01:29:12 GMT+0800 (China Standard Time)

I have tried to output the valid_range of the dataset, and still got [0, -6]. Are they supposed to be [0, 100] or [0.0, 100.0] ?

from netCDF4 import Dataset

nc=Dataset('OR_ABI-L2-RRQPEF-M3_G16_s20181072300427_e20181072311194_c20181072311280.nc')
data = nc['RRQPE'][:]
print data.dtype, data.min(), data.max()
print nc['RRQPE'].getncattr('valid_range')

float32 0.0 100.00009

[ 0, -6]

Ryan May · Answer 14 · Wed Jul 25 2018 08:11:14 GMT+0800 (China Standard Time)

The fix does not have the library change the valid_range attribute--it only fixed the automatic masking to use the proper data. IMO, changing attributes is outside the scope here.