rapidsai / cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms

Home Page:https://docs.rapids.ai/api/cuspatial/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]: cuspatial Numba JIT compiler cannot pick up the geopandas table when using the apply function

danbull-lynker opened this issue · comments

Version

23.10.0

On which installation method(s) does this occur?

Source

Describe the issue

I have built a function that I am applying to each row of a geopandas table. The function finds the neighbouring polygon with the longest boundary and updates a ‘landclass’ field. This then allows polygons to be dissolved based on the updates. There are a few other selection parameters applied i.e. some rows are excluded/included in being selected as a merge object. This works well in geopandas, however is slow when the shapefiles get above 1000 records - hence cuspatial could be useful as I have a lot of processing to do.
Using cuspatial, im getting the following error:

Minimum reproducible example

import os
import glob
import numpy as np
import geopandas as gpd
import json
from timeit import default_timer as timer
import itertools
import cuspatial

rasters = '/home/dan/projects/COWetland/tmp/composite15_comp_masked/'
vector_output ='/home/dan/projects/COWetland/tmp/composite15_comp_vectors3/'
inferences = glob.glob(rasters + '/m_3910839_ne*.tif')
composite = True

##gets the neighbouring polygons with the longest side
def getlongestsidelandclass(row):
    ##arguments
    min_size, includepolys, includetarget, excludetarget =1000, [0,3,4,5,6], [3,4,5,6], []
    landclass = row.landclass
    if row.geometry.area<min_size and row.landclass in includepolys:
        target_polys = gdf[gdf.id !=row.id]
        if len(excludetarget)>0:
            exc_target_polys =  target_polys[target_polys.landclass.isin(excludetarget)]
            exc_neighbors = exc_target_polys.geometry.intersection(row['geometry'])
            exc_blength = exc_neighbors.length
            exc_blength = exc_blength[exc_blength!=0].shape[0]
        else:
            exc_blength =0
        if exc_blength ==0:
            inc_target_polys =  target_polys[target_polys.landclass.isin(includetarget)]
            inc_neighbors = inc_target_polys.geometry.intersection(row['geometry'])
            neighbors = inc_target_polys.geometry.intersection(row['geometry'])
            blength = neighbors.length
            if blength[blength!=0].shape[0]>0:
                maxx = neighbors.length.idxmax()
                #print (maxx)
                #gdflandclass = gdf[gdf.id == maxx]
                landclass = gdf.loc[maxx,'landclass']
    return landclass


for inference in inferences:
    vector_tmp = os.path.join(vector_output, os.path.basename(inference)[:-4] + '_tmp.shp')

    gdfx = gpd.read_file(vector_tmp)
    gdf = cuspatial.GeoDataFrame(gdfx)
    #gdf = cuspatial.from_geopandas(gdfx)
    gdf["id"] = gdf.index

    if composite:
        print ('removing wetlands < 1000m2')
        start = timer()
        gdf["landclass"]= gdf.apply(getlongestsidelandclass, axis=1)
        gdf = gdf.dissolve(by='landclass', as_index=False)
        gdf = gdf.explode(index_parts=False)
        gdf["id"] = gdf.index
        fin =  timer()
        print("--- %s seconds ---" % str(fin - start))
    
    gdf = gdf[gdf.landclass !=0]
    if not gdf.empty:
        gdf.to_file(vector)

Relevant log output

(rapids3) dan@ox:~/projects/COWetland/code/create_wetland$ python vector_reduce.py
removing wetlands < 1000m2
Traceback (most recent call last):
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 2359, in _apply
    kernel, retty = _compile_or_get(
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
    result = func(*args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/udf/utils.py", line 268, in _compile_or_get
    kernel, scalar_return_type = kernel_getter(frame, func, args)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/udf/row_function.py", line 143, in _get_row_kernel
    scalar_return_type = _get_udf_return_type(row_type, func, args)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
    result = func(*args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/udf/utils.py", line 88, in _get_udf_return_type
    ptx, output_type = cudautils.compile_udf(func, compile_sig)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/utils/cudautils.py", line 126, in compile_udf
    ptx_code, return_type = cuda.compile_ptx_for_current_device(
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/cuda/compiler.py", line 319, in compile_ptx_for_current_device
    return compile_ptx(pyfunc, sig, debug=debug, lineinfo=lineinfo,
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/cuda/compiler.py", line 289, in compile_ptx
    cres = compile_cuda(pyfunc, return_type, args, debug=debug,
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/cuda/compiler.py", line 230, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 762, in compile_extra
    return pipeline.compile_extra(func)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 460, in compile_extra
    return self._compile_bytecode()
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 528, in _compile_bytecode
    return self._compile_core()
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 507, in _compile_core
    raise e
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler.py", line 494, in _compile_core
    pm.run(self.state)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 368, in run
    raise patched_exception
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/compiler_machinery.py", line 273, in check
    mangled = func(compiler_state)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typed_passes.py", line 110, in run_pass
    typemap, return_type, calltypes, errs = type_inference_stage(
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typed_passes.py", line 86, in type_inference_stage
    infer.build_constraint()
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1039, in build_constraint
    self.constrain_statement(inst)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1386, in constrain_statement
    self.typeof_assign(inst)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1461, in typeof_assign
    self.typeof_global(inst, inst.target, value)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1561, in typeof_global
    typ = self.resolve_value_type(inst, gvar.value)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/numba/core/typeinfer.py", line 1482, in resolve_value_type
    raise TypingError(msg, loc=inst.loc)
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Untyped global name 'gdf': Cannot determine Numba type of <class 'cuspatial.core.geodataframe.GeoDataFrame'>

File "vector_reduce.py", line 24:
def getlongestsidelandclass(row):
    <source elided>
    if row.geometry.area<min_size and row.landclass in includepolys:
        target_polys = gdf[gdf.id !=row.id]
        ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dan/projects/COWetland/code/create_wetland/vector_reduce.py", line 56, in <module>
    gdf["landclass"]= gdf.apply(getlongestsidelandclass, axis=1)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
    result = func(*args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/dataframe.py", line 4388, in apply
    return self._apply(func, _get_row_kernel, *args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
    result = func(*args, **kwargs)
  File "/home/dan/anaconda3/envs/rapids3/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 2363, in _apply
    raise ValueError(
ValueError: user defined function compilation failed.

Environment details

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=main
_openmp_mutex=5.1=1_gnu
attrs=23.1.0=pypi_0
bzip2=1.0.8=h7b6447c_0
ca-certificates=2023.08.22=h06a4308_0
cachetools=5.3.2=pypi_0
certifi=2023.7.22=pypi_0
click=8.1.7=pypi_0
click-plugins=1.1.1=pypi_0
cligj=0.7.2=pypi_0
cubinlinker-cu11=0.3.0.post1=pypi_0
cuda-python=11.8.3=pypi_0
cudf-cu11=23.10.0=pypi_0
cupy-cuda11x=12.2.0=pypi_0
cuspatial-cu11=23.10.0=pypi_0
fastrlock=0.8.2=pypi_0
fiona=1.9.5=pypi_0
fsspec=2023.10.0=pypi_0
geopandas=0.14.0=pypi_0
ld_impl_linux-64=2.38=h1181459_1
libffi=3.4.4=h6a678d5_0
libgcc-ng=11.2.0=h1234567_1
libgomp=11.2.0=h1234567_1
libstdcxx-ng=11.2.0=h1234567_1
libuuid=1.41.5=h5eee18b_0
llvmlite=0.40.1=pypi_0
ncurses=6.4=h6a678d5_0
numba=0.57.1=pypi_0
numpy=1.24.4=pypi_0
nvtx=0.2.8=pypi_0
openssl=3.0.11=h7f8727e_2
packaging=23.2=pypi_0
pandas=1.5.3=pypi_0
pip=23.3=py310h06a4308_0
protobuf=4.24.4=pypi_0
ptxcompiler-cu11=0.7.0.post1=pypi_0
pyarrow=12.0.1=pypi_0
pyproj=3.6.1=pypi_0
python=3.10.13=h955ad1f_0
python-dateutil=2.8.2=pypi_0
pytz=2023.3.post1=pypi_0
readline=8.2=h5eee18b_0
rmm-cu11=23.10.0=pypi_0
setuptools=68.0.0=py310h06a4308_0
shapely=2.0.2=pypi_0
six=1.16.0=pypi_0
sqlite=3.41.2=h5eee18b_0
tk=8.6.12=h1ccaba5_0
typing-extensions=4.8.0=pypi_0
tzdata=2023c=h04d1e81_0
wheel=0.41.2=py310h06a4308_0
xz=5.4.2=h5eee18b_0
zlib=1.2.13=h5eee18b_0

Other/Misc.

No response

Hi @danbull-lynker!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.

Hey @danbull-lynker - thanks for checking out cuSpatial!

There's a few things to talk about here, but the biggest thing is that cuSpatial today only has a limited subset of geopandas functionality that's been brought over to the GPU. Also, while we do our best to make the API the same or very similar to geopandas that's not always the case.

Specifically, dissolve, explode, polygon-polygon intersection and things like inherent length and area data in geometry objects aren't yet a part of cuSpatial.

A few thoughts:

  • We make it easy to accelerate what we can, and then use geopandas for the rest using to_geopandas() and from_geopandas()
    • Our user guide gives strong examples of our supported functions, and our API docs as well
    • Things like linestring-linestring intersection, point-in-polygon or any of the DE9-IM binary predicates are highly accelerated in cuSpatial (and non-spatial data analytics are accelerated by cuDF)
    • .apply() calls aren't strictly supported in cuSpatial and are inherited from cuDF so they can only be used for non-spatial algorithms
      • Even for cuDF supported calls, they need to be able to compile into a GPU kernel to be accelerated, here's the cuDF overview on user-defined functions for use with .apply()
      • .apply() also in general isn't the most performant method and in geopandas itself you may find success attempting to remove it and replace with direct calls against the geoseries/geodataframes
        • It's usually faster to perform an operation against the entire series all at once and save the results in a new column, then use that new column to do downstream processing

This isn't quite a bug, but I'm happy to keep discussing this with you to see where we could fit cuSpatial into your workflow so I'm going to convert this to a discussion.

If there are specific features/functionality you'd like to see (ie dissolve) please submit a feature request for them!