generateEncoder() fails for data class with ByteArray field
mlin opened this issue · comments
I have a data class containing a ByteArray blob field. When I try to work with a dataset of these I get (kotlin-spark-api v1.02, spark v3.1.2)
Exception in thread "main" java.lang.ClassCastException: class org.apache.spark.sql.types.BinaryType$ cannot be cast to class org.apache.spark.sql.types.ObjectType (org.apache.spark.sql.types.BinaryType$ and org.apache.spark.sql.types.ObjectType are in unnamed module of loader 'app')
at org.apache.spark.sql.KotlinReflection$.toCatalystArray$1(KotlinReflection.scala:609)
at org.apache.spark.sql.KotlinReflection$.$anonfun$serializerFor$1(KotlinReflection.scala:788)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69)
at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects(KotlinReflection.scala:1012)
at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects$(KotlinReflection.scala:1011)
at org.apache.spark.sql.KotlinReflection$.cleanUpReflectionObjects(KotlinReflection.scala:47)
at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:591)
at org.apache.spark.sql.KotlinReflection$.$anonfun$serializerFor$16(KotlinReflection.scala:761)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.sql.KotlinReflection$.$anonfun$serializerFor$1(KotlinReflection.scala:748)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69)
at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects(KotlinReflection.scala:1012)
at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects$(KotlinReflection.scala:1011)
at org.apache.spark.sql.KotlinReflection$.cleanUpReflectionObjects(KotlinReflection.scala:47)
at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:591)
at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:578)
at org.apache.spark.sql.KotlinReflection.serializerFor(KotlinReflection.scala)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.kotlinClassEncoder(ApiV1.kt:180)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
...
Artificial repro is merely
import org.jetbrains.kotlinx.spark.api.*
data class BlobTest(val blob: ByteArray) {
constructor(str: String) : this(str.toByteArray())
}
fun main() {
withSpark() {
dsOf(BlobTest("foo"), BlobTest("bar"))
}
}
Seems like this is the offending cast
where input.dataType
: BinaryType
is not an ObjectType
(but rather a sibling inheriting from DataType
). I wonder if any other primitive array would suffer the same?
We fixed the binarytype support in the #134 PR. It should be working fine now :)
The next release will contain the fix, but until then you can try to see if it works for you on the https://github.com/JetBrains/kotlin-spark-api/tree/spark-3.2 branch
Thanks, that's good news! I'll give it a shot when I'm able.
Can I close this?