Aggregation breaks with small enough shapefiles
bradyrx opened this issue · comments
I'm having the .aggregate(...)
step break with some shapefiles that are seemingly too small. This uses the Admin 2 level Brazilian municipalities. I've tested on some ERA5 data as well as some xarray
tutorial data (below) and the issue persists, so this is definitely due to the shapefiles.
If the pooch
retrieval below doesn't work, the Admin 2 shapefiles are here: https://data.humdata.org/dataset/brazil-administrative-level-0-boundaries.
import xarray as xr
import pooch
import geopandas as gpd
import xagg as xa
# Load in the Brazilian municipalities (Admin 2)
file = pooch.retrieve(
"https://data.humdata.org/dataset/f5f0648e-f085-4c85-8242-26bf6c942f40/resource/b4bf8e52-2de8-443f-a72d-287c1ef6b462/download/bra_adm_ibge_2020.gdb.zip",
None,
)
municipalities = gpd.read_file("zip://" + file)
# Set CRS since the shapefile does not come with a CRS
municipalities = municipalities.set_crs("EPSG:4326")
# Load in some global tutorial data from xarray
ds = xr.tutorial.open_dataset("eraint_uvz")
ds = ds.isel(level=0, month=0)["u"].to_dataset()
# Working case. Subset to 10 polygons that do work.
# Need to reset index since it breaks if index isn't continuous
# from zero.
_df = df.iloc[400:410]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)
# Breaking case.
_df = df.iloc[415:420]
_df = _df.reset_index()
wm = xa.pixel_overlaps(ds, _df)
aggregated = xa.aggregate(ds, wm)
Here's the traceback on the broken subset (I believe there are many other polygons here that are small enough to trigger this). I believe this suggests that it's an empty weight array.
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-63-c29dad4ffcd0> in <module>
1 wm = xa.pixel_overlaps(ds, _df)
----> 2 aggregated = xa.aggregate(ds, wm)
~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
406 # Replace overlapping pixel areas with nans if the corresponding pixel
407 # is only composed of nans
--> 408 tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
409 # Calculate the normalized area+weight of each pixel (taking into account
410 # nans)
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
You can also isolate the single polygon causing this:
# Select row 415 which seems to be too small.
polygon = gpd.GeoDataFrame(df.iloc[415]).T
polygon = polygon.set_crs("EPSG:4326")
polygon = polygon.reset_index()
wm = xa.pixel_overlaps(ds, polygon)
aggregated = xa.aggregate(ds, wm)
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-72-e6fb927d8b50> in <module>
1 wm = xa.pixel_overlaps(ds, polygon)
----> 2 aggregated = xa.aggregate(ds, wm)
~/miniconda3/envs/analysis/lib/python3.8/site-packages/xagg/core.py in aggregate(ds, wm)
406 # Replace overlapping pixel areas with nans if the corresponding pixel
407 # is only composed of nans
--> 408 tmp_areas[np.array(np.isnan(ds[var].isel(loc=wm.agg.iloc[poly_idx,:].pix_idxs)).all(other_dims).values)] = np.nan
409 # Calculate the normalized area+weight of each pixel (taking into account
410 # nans)
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
A hacky fix right now would be to loop through each polygon individually and run a try/except
block where IndexError
s are caught and then the closest grid cell to the polygon center is selected, but that's of course not ideal.
I have a case with a lot of small pieces and yes, that would defeat the purposes of an aggregation function.
Should've closed this a while ago - this was fixed with #10 I believe. At the very least, the error is no longer reproducible.