Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Broadcasting variables

Jolanrensen opened this issue · comments

Currently SparkContext.broadcast(variable) does not work because it requires the parameter evidence$9. The only way to make it work is using the JavaSparkContext which uses something called a fakeClassTag to provide the parameter.

I suspect it can be solved by adding this extension function to the API, although it needs to be tested of course:

fun <T> SparkContext.broadcast(value: T): Broadcast<T> = broadcast(value, JavaSparkContext.fakeClassTag())

Actually, probably something like encoder<T>().clsTag() should be used instead of the fakeClassTag.

After testing using

inline fun <reified T> SparkContext.broadcast(value: T): Broadcast<T> = broadcast(value, encoder<T>().clsTag())

I found it to be working quite well using primitive types, arrays and serializable classes. (Data) classes don't work though. It would probably need to be converted to something serializable before being broadcast, meaning a wrapper of some sorts would be needed.

Edit: Oh, as long as the (data) class implements java.io.Serializable it's fine! I have no problems broadcasting data class SomeClass(val a: IntArray, val b: Int) : Serializable

@Jolanrensen thank you for the report! How do you think will it be possible for you to provide us with simple(ish) testcase to test if it works correctly? Thank you!

Added and I see it's merged, cool!