apache / sedona

A cluster computing framework for processing large-scale geospatial data

Home Page:https://sedona.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hidden requirement for geopandas in apache-sedona[spark] 1.5.2

joonaspessi opened this issue · comments

Expected behavior

Installing Sedona for pyspark with package apache-sedona[spark] we expect that all package dependencies are installed correctly and geopandas is not needed when not using kepler or pydeck.

Actual behavior

After installing apache-sedona[spark] and trying to import from sedona.spark import * we see failure ModuleNotFoundError: No module named 'geopandas'

$ pip install "apache-sedona[spark]"==1.5.2
$ python
Python 3.8.18 (default, Feb 13 2024, 15:47:05)
[Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sedona.spark import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/spark/__init__.py", line 44, in <module>
    from sedona.maps.SedonaKepler import SedonaKepler
  File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaKepler.py", line 18, in <module>
    from sedona.maps.SedonaMapUtils import SedonaMapUtils
  File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaMapUtils.py", line 19, in <module>
    import geopandas as gpd
ModuleNotFoundError: No module named 'geopandas'

Steps to reproduce the problem

Create clean python environment and run commands:

$ pip install "apache-sedona[spark]"==1.5.2
$ python
Python 3.8.18 (default, Feb 13 2024, 15:47:05)
[Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sedona.spark import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/spark/__init__.py", line 44, in <module>
    from sedona.maps.SedonaKepler import SedonaKepler
  File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaKepler.py", line 18, in <module>
    from sedona.maps.SedonaMapUtils import SedonaMapUtils
  File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaMapUtils.py", line 19, in <module>
    import geopandas as gpd
ModuleNotFoundError: No module named 'geopandas'

Settings

Sedona version = 1.5.2

Apache Spark version = 3.5.1

Apache Flink version = ?

API type = Python

Scala version = N/A

JRE version = 1.8

Python version = 3.8

Environment = Standalone

@joonaspessi Sorry, this PR accidentally introduces this issue: #1229

To bypass this problem, instead of use from sedona.spark import *, please use from sedona.spark.SedonaContext import SedonaContext

Hello, thanks for the fast response!

I think that the python will load the __init__.py file for the sedona.spark module even when importing sub file from the given module sedona.spark.

This is very sad. We will make a follow up release to fix this bug.

We have released 1.5.3 to fix this bug! @joonaspessi