Default value SparkHelper overrides config for spark.master
christopherfrieler opened this issue · comments
Currently org.jetbrains.kotlinx.spark.api.withSpark(props, master, appName, logLevel, func)
provides a default value for the parameter master
. This overrides the value from the external spark config provided with the spark-submit command.
In my case, I'm running Spark on yarn, so I'm having a submit-command like this ./spark-submit --master yarn ...
. But when I start the SparkSession in my code with
withSpark(
props = mapOf("some_key" to "some_value"),
) {
//... my Spark code
}
, the default "local[*]"
for the parameter master
is used. This leads to a local Spark session on the application master (which kind of works, but does not make use of the executors provided by yarn) and yarn complaining that I never started a Spark session after my job finishes, because it doesn't recognize the local one.
I think, by default the value for master
should be taken from the external config loaded by SparkConf()
. As a workaround, I load the value myself:
withSpark(
master = SparkConf().get("spark.master", "local[*]"),
props = mapOf("some_key" to "some_value"),
) {
//... my Spark code
}
This works, but is not nice and duplicates the default value "local[*]".
Thanks for letting us know!
How do you suggest we set the default value?
Also, maybe @asm0dey has an idea?
@christopherfrieler thanks for the idea! @Jolanrensen it looks like Christopher provided us with complete implementation, we should just change the default value :)
Sure! Just wanted to check whether this is the best way to go, since they did call it a "workaround" and "not nice", haha. But creating a new SparkConfig()
does seem the best way to access system/jvm variables.
Added!