apache / sedona

A cluster computing framework for processing large-scale geospatial data

Home Page:https://sedona.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fieldNames - AttributeError: Not available before 1.0.0 sedona version

AlexTelloG opened this issue · comments

commented

Expected behavior

When creating a spatial_rdd from a pyspark dataframe I should be able to access the fieldNames attribute by the following command:

spatial_rdd.fieldNames

Which should give me the additional names included in the spark data frame.

Actual behavior

I cannot longer access the fieldNames attribute of the resulting RDD. The following error appears:

´´´
AttributeError: Not available before 1.0.0 sedona version
´´´

This is interesting because the sedona version being used is not 1.0.0 but 1.4.1 or higher. Also this used to work without problem for previous versions of sedona.

I ran into this issue today migrating from sedona 1.4 to the latest one which deprecated the use of the SedonaRegistrator.

Steps to reproduce the problem

Create a spatial_rdd and attempt to get the fieldNames attribute:

spatial_rdd = Adapter.toSpatialRdd(spatial_rdd.select('local_id','location'), 'location')
self.search_rdd.analyze()

print(f'showing spatial_rdd.fieldNames: {spatial_rdd.fieldNames}')

Settings

Sedona version = 1.4.1 or higher

Apache Spark version = 3.4.1 or higher

Apache Flink version = ?

API type = Python

Scala version =

JRE version = 1.8, 1.11?

Python version = 3.9

Environment = Standalone

This is part of the spark config for spark-submit:

--packages org.apache.sedona:sedona-spark-3.4_2.12:1.5.1,
org.datasyslab:geotools-wrapper:1.4.0-28.2,
uk.co.gresearch.spark:spark-extension_2.13:2.11.0-3.4

You can use the spark-shaded dependency org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1 instead of org.apache.sedona:sedona-spark-3.4_2.12:1.5.1.

The sedona python binding cannot figure out the version number when the shaded jar is not being used (https://github.com/apache/sedona/blob/sedona-1.4.1/python/sedona/core/jvm/config.py#L207), maybe it is a problem and we should fix it.

commented

I have run into two bugs today and I document them here for your consideration.

  1. With the shaded dependency, specifically org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1 I run into another issue. -> edu.ucar#cdm-core;5.4.2: not found unresolved dependency. This appears to be a known and reported bug. Solution for now seems to downgrade to 1.4.1.

  2. The same seems to happen when reproducing the spark setup from your notebooks, for example located here.

@AlexTelloG Since you are using standalone spark with --packages option, you can append one more option --repositories https://artifacts.unidata.ucar.edu/repository/unidata-all. This will solve the edu.ucar#cdm-core;5.4.2: not found issue

commented

Thank you so much for the quick replies and help, really appreciate it. Also for the amazing work in this library!