[BUG]: cuspatial Numba JIT compiler cannot pick up the geopandas table when using the apply function
danbull-lynker opened this issue · comments
Version
23.10.0
On which installation method(s) does this occur?
Source
Describe the issue
I have built a function that I am applying to each row of a geopandas table. The function finds the neighbouring polygon with the longest boundary and updates a ‘landclass’ field. This then allows polygons to be dissolved based on the updates. There are a few other selection parameters applied i.e. some rows are excluded/included in being selected as a merge object. This works well in geopandas, however is slow when the shapefiles get above 1000 records - hence cuspatial could be useful as I have a lot of processing to do.
Using cuspatial, im getting the following error:
Minimum reproducible example
import os
import glob
import numpy as np
import geopandas as gpd
import json
from timeit import default_timer as timer
import itertools
import cuspatial
rasters = '/home/dan/projects/COWetland/tmp/composite15_comp_masked/'
vector_output ='/home/dan/projects/COWetland/tmp/composite15_comp_vectors3/'
inferences = glob.glob(rasters + '/m_3910839_ne*.tif')
composite = True
##gets the neighbouring polygons with the longest side
def getlongestsidelandclass(row):
##arguments
min_size, includepolys, includetarget, excludetarget =1000, [0,3,4,5,6], [3,4,5,6], []
landclass = row.landclass
if row.geometry.area<min_size and row.landclass in includepolys:
target_polys = gdf[gdf.id !=row.id]
if len(excludetarget)>0:
exc_target_polys = target_polys[target_polys.landclass.isin(excludetarget)]
exc_neighbors = exc_target_polys.geometry.intersection(row['geometry'])
exc_blength = exc_neighbors.length
exc_blength = exc_blength[exc_blength!=0].shape[0]
else:
exc_blength =0
if exc_blength ==0:
inc_target_polys = target_polys[target_polys.landclass.isin(includetarget)]
inc_neighbors = inc_target_polys.geometry.intersection(row['geometry'])
neighbors = inc_target_polys.geometry.intersection(row['geometry'])
blength = neighbors.length
if blength[blength!=0].shape[0]>0:
maxx = neighbors.length.idxmax()
#print (maxx)
#gdflandclass = gdf[gdf.id == maxx]
landclass = gdf.loc[maxx,'landclass']
return landclass
for inference in inferences:
vector_tmp = os.path.join(vector_output, os.path.basename(inference)[:-4] + '_tmp.shp')
gdfx = gpd.read_file(vector_tmp)
gdf = cuspatial.GeoDataFrame(gdfx)
#gdf = cuspatial.from_geopandas(gdfx)
gdf["id"] = gdf.index
if composite:
print ('removing wetlands < 1000m2')
start = timer()
gdf["landclass"]= gdf.apply(getlongestsidelandclass, axis=1)
gdf = gdf.dissolve(by='landclass', as_index=False)
gdf = gdf.explode(index_parts=False)
gdf["id"] = gdf.index
fin = timer()
print("--- %s seconds ---" % str(fin - start))
gdf = gdf[gdf.landclass !=0]
if not gdf.empty:
gdf.to_file(vector)
Relevant log output
(rapids3) dan@ox:~/projects/COWetland/code/create_wetland$ python vector_reduce.py
removing wetlands < 1000m2
Traceback (most recent call last):
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 2359, in _apply
kernel, retty = _compile_or_get(
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
result = func(*args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/udf/utils.py", line 268, in _compile_or_get
kernel, scalar_return_type = kernel_getter(frame, func, args)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/udf/row_function.py", line 143, in _get_row_kernel
scalar_return_type = _get_udf_return_type(row_type, func, args)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
result = func(*args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/udf/utils.py", line 88, in _get_udf_return_type
ptx, output_type = cudautils.compile_udf(func, compile_sig)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/utils/cudautils.py", line 126, in compile_udf
ptx_code, return_type = cuda.compile_ptx_for_current_device(
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/cuda/compiler.py", line 319, in compile_ptx_for_current_device
return compile_ptx(pyfunc, sig, debug=debug, lineinfo=lineinfo,
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/cuda/compiler.py", line 289, in compile_ptx
cres = compile_cuda(pyfunc, return_type, args, debug=debug,
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/cuda/compiler.py", line 230, in compile_cuda
cres = compiler.compile_extra(typingctx=typingctx,
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 762, in compile_extra
return pipeline.compile_extra(func)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 460, in compile_extra
return self._compile_bytecode()
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 528, in _compile_bytecode
return self._compile_core()
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 507, in _compile_core
raise e
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 494, in _compile_core
pm.run(self.state)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 368, in run
raise patched_exception
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 356, in run
self._runPass(idx, pass_inst, state)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 273, in check
mangled = func(compiler_state)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typed_passes.py", line 110, in run_pass
typemap, return_type, calltypes, errs = type_inference_stage(
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typed_passes.py", line 86, in type_inference_stage
infer.build_constraint()
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1039, in build_constraint
self.constrain_statement(inst)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1386, in constrain_statement
self.typeof_assign(inst)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1461, in typeof_assign
self.typeof_global(inst, inst.target, value)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1561, in typeof_global
typ = self.resolve_value_type(inst, gvar.value)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1482, in resolve_value_type
raise TypingError(msg, loc=inst.loc)
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Untyped global name 'gdf': Cannot determine Numba type of <class 'cuspatial.core.geodataframe.GeoDataFrame'>
File "vector_reduce.py", line 24:
def getlongestsidelandclass(row):
<source elided>
if row.geometry.area<min_size and row.landclass in includepolys:
target_polys = gdf[gdf.id !=row.id]
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/dan/projects/COWetland/code/create_wetland/vector_reduce.py", line 56, in <module>
gdf["landclass"]= gdf.apply(getlongestsidelandclass, axis=1)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
result = func(*args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/dataframe.py", line 4388, in apply
return self._apply(func, _get_row_kernel, *args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
result = func(*args, **kwargs)
File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 2363, in _apply
raise ValueError(
ValueError: user defined function compilation failed.
Environment details
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=main
_openmp_mutex=5.1=1_gnu
attrs=23.1.0=pypi_0
bzip2=1.0.8=h7b6447c_0
ca-certificates=2023.08.22=h06a4308_0
cachetools=5.3.2=pypi_0
certifi=2023.7.22=pypi_0
click=8.1.7=pypi_0
click-plugins=1.1.1=pypi_0
cligj=0.7.2=pypi_0
cubinlinker-cu11=0.3.0.post1=pypi_0
cuda-python=11.8.3=pypi_0
cudf-cu11=23.10.0=pypi_0
cupy-cuda11x=12.2.0=pypi_0
cuspatial-cu11=23.10.0=pypi_0
fastrlock=0.8.2=pypi_0
fiona=1.9.5=pypi_0
fsspec=2023.10.0=pypi_0
geopandas=0.14.0=pypi_0
ld_impl_linux-64=2.38=h1181459_1
libffi=3.4.4=h6a678d5_0
libgcc-ng=11.2.0=h1234567_1
libgomp=11.2.0=h1234567_1
libstdcxx-ng=11.2.0=h1234567_1
libuuid=1.41.5=h5eee18b_0
llvmlite=0.40.1=pypi_0
ncurses=6.4=h6a678d5_0
numba=0.57.1=pypi_0
numpy=1.24.4=pypi_0
nvtx=0.2.8=pypi_0
openssl=3.0.11=h7f8727e_2
packaging=23.2=pypi_0
pandas=1.5.3=pypi_0
pip=23.3=py310h06a4308_0
protobuf=4.24.4=pypi_0
ptxcompiler-cu11=0.7.0.post1=pypi_0
pyarrow=12.0.1=pypi_0
pyproj=3.6.1=pypi_0
python=3.10.13=h955ad1f_0
python-dateutil=2.8.2=pypi_0
pytz=2023.3.post1=pypi_0
readline=8.2=h5eee18b_0
rmm-cu11=23.10.0=pypi_0
setuptools=68.0.0=py310h06a4308_0
shapely=2.0.2=pypi_0
six=1.16.0=pypi_0
sqlite=3.41.2=h5eee18b_0
tk=8.6.12=h1ccaba5_0
typing-extensions=4.8.0=pypi_0
tzdata=2023c=h04d1e81_0
wheel=0.41.2=py310h06a4308_0
xz=5.4.2=h5eee18b_0
zlib=1.2.13=h5eee18b_0
Other/Misc.
No response
Hi @danbull-lynker!
Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.
Hey @danbull-lynker - thanks for checking out cuSpatial!
There's a few things to talk about here, but the biggest thing is that cuSpatial today only has a limited subset of geopandas functionality that's been brought over to the GPU. Also, while we do our best to make the API the same or very similar to geopandas that's not always the case.
Specifically, dissolve
, explode
, polygon-polygon intersection
and things like inherent length
and area
data in geometry objects aren't yet a part of cuSpatial.
A few thoughts:
- We make it easy to accelerate what we can, and then use geopandas for the rest using
to_geopandas()
andfrom_geopandas()
- Our user guide gives strong examples of our supported functions, and our API docs as well
- Things like
linestring-linestring
intersection,point-in-polygon
or any of the DE9-IM binary predicates are highly accelerated in cuSpatial (and non-spatial data analytics are accelerated by cuDF) .apply()
calls aren't strictly supported in cuSpatial and are inherited from cuDF so they can only be used for non-spatial algorithms- Even for cuDF supported calls, they need to be able to compile into a GPU kernel to be accelerated, here's the cuDF overview on user-defined functions for use with .apply()
.apply()
also in general isn't the most performant method and in geopandas itself you may find success attempting to remove it and replace with direct calls against the geoseries/geodataframes- It's usually faster to perform an operation against the entire series all at once and save the results in a new column, then use that new column to do downstream processing
This isn't quite a bug, but I'm happy to keep discussing this with you to see where we could fit cuSpatial into your workflow so I'm going to convert this to a discussion.
If there are specific features/functionality you'd like to see (ie dissolve) please submit a feature request for them!