ks905383 / xagg

Aggregating gridded data (xarray) to polygons

Home Page:https://xagg.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem generating weights with DAYMET data

mewieczo opened this issue · comments

DAYMET data has lat and long that is 2 dimensional and x and y that are 1 dimensional. I dropped lat and long, and when running pixel overlap I get the following error:

ValueError: cannot reindex or align along dimension 'lon' because the index has duplicate values

Any suggestions or help is appreciated.
Mike

@ks905383: Could it be that xagg currently only works for gridded datasets that have 1D lon,lat coordinates?

The DAYMET data has 1D (x,y) Lambert Conformal Conic projected coordinates, so the (lon,lat) arrays are 2D.

Yeah - it's currently only tested / written for rectangular grids.

Expanding processing to work with non-rectangular grids should not be the most complicated thing in the world, and might be achievable mainly through expanding the functionality of create_raster_polygons. Maybe let's keep this open until that's done.

In the meantime, I'll see about putting in a helpful error message if a non-rectangular grid is detected, or one where the coordinates aren't the dimensions etc.

@ks905383, okay, it's not just user error. Good! 😄
Do you know if the rasterstats package might work for the use case of @mewieczo?

Here is an example of how I generated cells for Daymet grids. It might be of some help.

https://github.com/nhm-usgs/onhm-fetcher-parser/blob/master/pkg/CalcDaymetWeights.py

@ks905383, okay, it's not just user error. Good! 😄
Do you know if the rasterstats package might work for the use case of @mewieczo?

hm... it might. I personally haven't used rasterstats, but their method seems relatively robust (any pixel within the polygon boundary is included, either by center or by any part of the pixel), as long as the crs'es are well-defined? You won't get the area-averaging functionality of xagg, but depending on how much accuracy you need (i.e. how big are your pixels relative to your polygons, how bad is it if a pixel that only barely touches a polygon is included in the weighting) you might not need it anyways.

Here is an example of how I generated cells for Daymet grids. It might be of some help.

https://github.com/nhm-usgs/onhm-fetcher-parser/blob/master/pkg/CalcDaymetWeights.py

oh nice, thanks for that - I'll take a look

@ks905383 would be interested in helping add capability for processing structured non-uniform grids to xagg Kevin. I've got code to generate lat_bnds and lon_bnds. Wondering if you would have time and interest to meet and discuss how it would make sense to you to add that capability? Cheers Kevin.

Hey @rmcd-mscb , apologies for the long radio silence here.

That actually sounds great - we could chat later this week or next week and see about integrating it?

Thanks @ks905383 - I sent a calendar invite.

Hopefully, all major processing changes would only have to be in create_raster_polygons and get_bnds .

The remaining issues beyond that are largely semantic, in particular related to hardcoded calls to lat, lon. This may be fixable through implementing cf_xarray in xagg (Issue #1). Here's a quick (non-comprehensive) list of parts of the code where this might cause issues:

In xa.core

  • Line 157 needs to be verified to make sure it still manages to create the bounding box without adding extra dimensions
  • Line 172 should work fine without modification (broadcasting out the bounds), but there may be dimensional issues
  • Line 180 probably needs major changes
  • Generally, references to .lat and .lon may be problematic. This is maybe where cf_xarray may be needed to be incorporated
    • All the .stack(loc=('lat','lon')) calls, for example
    • Or the block of code starting in Line 205 about assigning lat/lon values to the geopandas pixel polygons
  • In process_weights(): xesmf can deal with non-rectangular grids, so this may be fine as is?

In xa.aux

  • xa.fix_ds() 's references to lat , lon need to be double-checked, specifically the sortby(ds.lat) /lon calls
  • xa.get_bnds() of course...
  • xa.subset_find()
    • subset_find is mainly used in optimization, to do calculations on just a geographic subset of the data. If it doesn't instantly work with non-rectangular grids, can just skip the subsetting if it's a non-rectangular grid with a warning.
    • (but you can feed stacked loc dim'ed arrays into it, so maybe that's easy to deal with?)

I was thinking about using xagg with DAYMET again, and thinking, "okay, it doesn't have 1D lon,lat coordinates, but it does have 1D x,y, coordinates" (lambert conformal projection, see this notebook), so I'm wondering if I could just:

  • modify xagg to recognize x,y
  • supply geopandas polygons in the same projection
  • change the parts of the code that assume lon/lat (EPSG:4326) units to instead read the input grid and polygon CRS and then convert to EPSG:693{1,2,3}

I'm thinking the real limitation is the algorithm that needs 1D coordinates....

What would be the problems with this approach?