Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to initialize spark in Jupyter Notebook

mlcohen opened this issue · comments

Hi -- I've been attempting to get kotlin-spark to work in Jupyter Notebook (v7.0.2). Unfortunately every time I try to run the magic line %use spark in my Jupyter notebook (using the kotlin kernel [kotlin-jupyter-kernel]), I end up getting the following error:

received properties: Properties: {spark=3.3.1, scala=2.13, v=1.2.3, displayLimit=20, displayTruncate=30, spark.app.name=Jupyter, spark.master=local[*], spark.sql.codegen.wholeStage=false, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem}, providing Spark with: {spark.app.name=Jupyter, spark.master=local[*], spark.sql.codegen.wholeStage=false, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem}
23/08/10 10:46:58 INFO SparkContext: Running Spark version 3.3.1
23/08/10 10:46:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/08/10 10:46:58 INFO ResourceUtils: ==============================================================
23/08/10 10:46:58 INFO ResourceUtils: No custom resources configured for spark.driver.
23/08/10 10:46:58 INFO ResourceUtils: ==============================================================
23/08/10 10:46:58 INFO SparkContext: Submitted application: Jupyter
... [clipped log output for brevity]
The problem is found in one of the loaded libraries: check library init codes
org.jetbrains.kotlinx.jupyter.exceptions.ReplEvalRuntimeException: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x5d9e0fc3) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x5d9e0fc3
org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library init codes
	at org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryExceptionKt.rethrowAsLibraryException(ReplLibraryException.kt:32)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$ExecutionContext.doAddLibraries(CellExecutorImpl.kt:151)
... [clipped log output for brevity]
Caused by: org.jetbrains.kotlinx.jupyter.exceptions.ReplEvalRuntimeException: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x5d9e0fc3) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x5d9e0fc3
	at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl.eval(InternalEvaluatorImpl.kt:110)
... [clipped log output for brevity]
Caused by: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x5d9e0fc3) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x5d9e0fc3
	at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala:213)
	at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:114)
	at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:353)
	at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:290)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:339)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
	at Line_5_jupyter.<init>(Line_5.jupyter.kts:11)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.evalWithConfigAndOtherScriptsResults(BasicJvmScriptEvaluator.kt:105)
	at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke$suspendImpl(BasicJvmScriptEvaluator.kt:47)
	at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke(BasicJvmScriptEvaluator.kt)
	at kotlin.script.experimental.jvm.BasicJvmReplEvaluator.eval(BasicJvmReplEvaluator.kt:49)
	at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl$eval$resultWithDiagnostics$1.invokeSuspend(InternalEvaluatorImpl.kt:103)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl.eval(InternalEvaluatorImpl.kt:103)
	... 50 more

I'm running spark locally; no remote cluster setup. Any ideas what I might be doing wrong?

Other details:

  • OS: macOS Ventura 13.4.1
  • Java: openjdk version "17.0.7" 2023-04-18 LTS
  • conda: miniconda 23.7.2
  • python: 3.11.4
  • kotlin-jupyter-kernel: 0.11.0.385

Seems specific to your system, I cannot reproduce it. Are you able to run a normal spark project? Without notebooks?