Aggregation breaks with small enough shapefiles

Question

Aggregation breaks with small enough shapefiles

bradyrx opened this issue 3 years ago · comments

I'm having the .aggregate(...) step break with some shapefiles that are seemingly too small. This uses the Admin 2 level Brazilian municipalities. I've tested on some ERA5 data as well as some xarray tutorial data (below) and the issue persists, so this is definitely due to the shapefiles.

If the pooch retrieval below doesn't work, the Admin 2 shapefiles are here: https://data.humdata.org/dataset/brazil-administrative-level-0-boundaries.

import xarray as xr
import pooch
import geopandas as gpd
import xagg as xa

# Load in the Brazilian municipalities (Admin 2)
file = pooch.retrieve(
    "https://data.humdata.org/dataset/f5f0648e-f085-4c85-8242-26bf6c942f40/resource/b4bf8e52-2de8-443f-a72d-287c1ef6b462/download/bra_adm_ibge_2020.gdb.zip",
    None,
)
municipalities = gpd.read_file("zip://" + file)

# Set CRS since the shapefile does not come with a CRS
municipalities = municipalities.set_crs("EPSG:4326")

# Load in some global tutorial data from xarray
ds = xr.tutorial.open_dataset("eraint_uvz")
ds = ds.isel(level=0, month=0)["u"].to_dataset()

# Working case. Subset to 10 polygons that do work.
# Need to reset index since it breaks if index isn't continuous
# from zero.
_df = df.iloc[400:410]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)

# Breaking case.
_df = df.iloc[415:420]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)

Here's the traceback on the broken subset (I believe there are many other polygons here that are small enough to trigger this). I believe this suggests that it's an empty weight array.

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-63-c29dad4ffcd0> in <module>
      1 wm = xa.pixel_overlaps(ds, _df)
----> 2 aggregated = xa.aggregate(ds, wm)

~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
    406                         # Replace overlapping pixel areas with nans if the corresponding pixel
    407                         # is only composed of nans
--> 408                         tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
    409                         # Calculate the normalized area+weight of each pixel (taking into account
    410                         # nans)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

You can also isolate the single polygon causing this:

# Select row 415 which seems to be too small.
polygon = gpd.GeoDataFrame(df.iloc[415]).T
polygon = polygon.set_crs("EPSG:4326")
polygon = polygon.reset_index()
wm = xa.pixel_overlaps(ds, polygon)
aggregated = xa.aggregate(ds, wm)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-72-e6fb927d8b50> in <module>
      1 wm = xa.pixel_overlaps(ds, polygon)
----> 2 aggregated = xa.aggregate(ds, wm)

~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
    406                         # Replace overlapping pixel areas with nans if the corresponding pixel
    407                         # is only composed of nans
--> 408                         tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
    409                         # Calculate the normalized area+weight of each pixel (taking into account
    410                         # nans)

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

A hacky fix right now would be to loop through each polygon individually and run a try/except block where IndexErrors are caught and then the closest grid cell to the polygon center is selected, but that's of course not ideal.

RichardScottOZ · Answer 1 · Fri Jun 24 2022 07:15:01 GMT+0800 (China Standard Time)

I have a case with a lot of small pieces and yes, that would defeat the purposes of an aggregation function.

Kevin Schwarzwald · Answer 2 · Tue Jul 04 2023 03:49:05 GMT+0800 (China Standard Time)

Should've closed this a while ago - this was fixed with #10 I believe. At the very least, the error is no longer reproducible.