fieldNames - AttributeError: Not available before 1.0.0 sedona version
AlexTelloG opened this issue · comments
Expected behavior
When creating a spatial_rdd from a pyspark dataframe I should be able to access the fieldNames attribute by the following command:
spatial_rdd.fieldNames
Which should give me the additional names included in the spark data frame.
Actual behavior
I cannot longer access the fieldNames attribute of the resulting RDD. The following error appears:
´´´
AttributeError: Not available before 1.0.0 sedona version
´´´
This is interesting because the sedona version being used is not 1.0.0 but 1.4.1 or higher. Also this used to work without problem for previous versions of sedona.
I ran into this issue today migrating from sedona 1.4 to the latest one which deprecated the use of the SedonaRegistrator.
Steps to reproduce the problem
Create a spatial_rdd and attempt to get the fieldNames attribute:
spatial_rdd = Adapter.toSpatialRdd(spatial_rdd.select('local_id','location'), 'location')
self.search_rdd.analyze()
print(f'showing spatial_rdd.fieldNames: {spatial_rdd.fieldNames}')
Settings
Sedona version = 1.4.1 or higher
Apache Spark version = 3.4.1 or higher
Apache Flink version = ?
API type = Python
Scala version =
JRE version = 1.8, 1.11?
Python version = 3.9
Environment = Standalone
This is part of the spark config for spark-submit:
--packages org.apache.sedona:sedona-spark-3.4_2.12:1.5.1,
org.datasyslab:geotools-wrapper:1.4.0-28.2,
uk.co.gresearch.spark:spark-extension_2.13:2.11.0-3.4
You can use the spark-shaded dependency org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1
instead of org.apache.sedona:sedona-spark-3.4_2.12:1.5.1
.
The sedona python binding cannot figure out the version number when the shaded jar is not being used (https://github.com/apache/sedona/blob/sedona-1.4.1/python/sedona/core/jvm/config.py#L207), maybe it is a problem and we should fix it.
I have run into two bugs today and I document them here for your consideration.
-
With the shaded dependency, specifically
org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1
I run into another issue. ->edu.ucar#cdm-core;5.4.2: not found
unresolved dependency. This appears to be a known and reported bug. Solution for now seems to downgrade to 1.4.1. -
The same seems to happen when reproducing the spark setup from your notebooks, for example located here.
@AlexTelloG Since you are using standalone spark with --packages
option, you can append one more option --repositories https://artifacts.unidata.ucar.edu/repository/unidata-all
. This will solve the edu.ucar#cdm-core;5.4.2: not found
issue
Thank you so much for the quick replies and help, really appreciate it. Also for the amazing work in this library!