Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Default value SparkHelper overrides config for spark.master

christopherfrieler opened this issue · comments

Currently org.jetbrains.kotlinx.spark.api.withSpark(props, master, appName, logLevel, func) provides a default value for the parameter master. This overrides the value from the external spark config provided with the spark-submit command.

In my case, I'm running Spark on yarn, so I'm having a submit-command like this ./spark-submit --master yarn .... But when I start the SparkSession in my code with

withSpark(
    props = mapOf("some_key" to "some_value"),
) {
    //... my Spark code
}

, the default "local[*]" for the parameter master is used. This leads to a local Spark session on the application master (which kind of works, but does not make use of the executors provided by yarn) and yarn complaining that I never started a Spark session after my job finishes, because it doesn't recognize the local one.

I think, by default the value for master should be taken from the external config loaded by SparkConf(). As a workaround, I load the value myself:

withSpark(
    master = SparkConf().get("spark.master", "local[*]"),
    props = mapOf("some_key" to "some_value"),
) {
    //... my Spark code
}

This works, but is not nice and duplicates the default value "local[*]".

Thanks for letting us know!
How do you suggest we set the default value?
Also, maybe @asm0dey has an idea?

@christopherfrieler thanks for the idea! @Jolanrensen it looks like Christopher provided us with complete implementation, we should just change the default value :)

Sure! Just wanted to check whether this is the best way to go, since they did call it a "workaround" and "not nice", haha. But creating a new SparkConfig() does seem the best way to access system/jvm variables.

Added!