data-apis / python-record-api

Inferring Python API signatures from tracing usage.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix Koalas Runs

saulshanabrook opened this issue · comments

#86 fixed the github actions which caused the koalas image to be built and run, that was added in #82.

It built properly, but failed to run. Here is the output copied:

Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/lib/python3.8/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-2e6e8f27-d3fa-4533-bc4e-38938ded36f3;1.0
	confs: [default]
	found io.delta#delta-core_2.12;0.7.0 in central
	found org.antlr#antlr4;4.7 in central
	found org.antlr#antlr4-runtime;4.7 in central
	found org.antlr#antlr-runtime;3.5.2 in central
	found org.antlr#ST4;4.0.8 in central
	found org.abego.treelayout#org.abego.treelayout.core;1.0.3 in central
	found org.glassfish#javax.json;1.0.4 in central
	found com.ibm.icu#icu4j;58.2 in central
downloading https://repo1.maven.org/maven2/io/delta/delta-core_2.12/0.7.0/delta-core_2.12-0.7.0.jar ...
	[SUCCESSFUL ] io.delta#delta-core_2.12;0.7.0!delta-core_2.12.jar (96ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr4/4.7/antlr4-4.7.jar ...
	[SUCCESSFUL ] org.antlr#antlr4;4.7!antlr4.jar (28ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar ...
	[SUCCESSFUL ] org.antlr#antlr4-runtime;4.7!antlr4-runtime.jar (6ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr-runtime/3.5.2/antlr-runtime-3.5.2.jar ...
	[SUCCESSFUL ] org.antlr#antlr-runtime;3.5.2!antlr-runtime.jar (5ms)
downloading https://repo1.maven.org/maven2/org/antlr/ST4/4.0.8/ST4-4.0.8.jar ...
	[SUCCESSFUL ] org.antlr#ST4;4.0.8!ST4.jar (7ms)
downloading https://repo1.maven.org/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.3/org.abego.treelayout.core-1.0.3.jar ...
	[SUCCESSFUL ] org.abego.treelayout#org.abego.treelayout.core;1.0.3!org.abego.treelayout.core.jar(bundle) (4ms)
downloading https://repo1.maven.org/maven2/org/glassfish/javax.json/1.0.4/javax.json-1.0.4.jar ...
	[SUCCESSFUL ] org.glassfish#javax.json;1.0.4!javax.json.jar(bundle) (3ms)
downloading https://repo1.maven.org/maven2/com/ibm/icu/icu4j/58.2/icu4j-58.2.jar ...
	[SUCCESSFUL ] com.ibm.icu#icu4j;58.2!icu4j.jar (118ms)
:: resolution report :: resolve 1696ms :: artifacts dl 278ms
	:: modules in use:
	com.ibm.icu#icu4j;58.2 from central in [default]
	io.delta#delta-core_2.12;0.7.0 from central in [default]
	org.abego.treelayout#org.abego.treelayout.core;1.0.3 from central in [default]
	org.antlr#ST4;4.0.8 from central in [default]
	org.antlr#antlr-runtime;3.5.2 from central in [default]
	org.antlr#antlr4;4.7 from central in [default]
	org.antlr#antlr4-runtime;4.7 from central in [default]
	org.glassfish#javax.json;1.0.4 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   8   |   8   |   8   |   0   ||   8   |   8   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-2e6e8f27-d3fa-4533-bc4e-38938ded36f3
	confs: [default]
	8 artifacts copied, 0 already retrieved (15071kB/48ms)
20/10/21 12:58:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/10/21 12:58:31 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@koalas-1.2.1-0-0-1-3503248195:37093
	at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
	at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:34)
	at org.apache.spark.executor.Executor.<init>(Executor.scala:206)
	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
20/10/21 12:58:31 ERROR Utils: Uncaught exception in thread Thread-5
java.lang.NullPointerException
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:168)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:734)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2171)
	at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1973)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1973)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:641)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
20/10/21 12:58:31 WARN MetricsSystem: Stopping a MetricsSystem that is not running
ImportError while loading conftest '/usr/src/app/databricks/conftest.py'.
databricks/conftest.py:41: in <module>
    session = utils.default_session(shared_conf)
databricks/koalas/utils.py:384: in default_session
    session = builder.getOrCreate()
/usr/local/lib/python3.8/site-packages/pyspark/sql/session.py:186: in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
/usr/local/lib/python3.8/site-packages/pyspark/context.py:376: in getOrCreate
    SparkContext(conf=conf or SparkConf())
/usr/local/lib/python3.8/site-packages/pyspark/context.py:135: in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/usr/local/lib/python3.8/site-packages/pyspark/context.py:198: in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
/usr/local/lib/python3.8/site-packages/pyspark/context.py:315: in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
/usr/local/lib/python3.8/site-packages/py4j/java_gateway.py:1568: in __call__
    return_value = get_return_value(
/usr/local/lib/python3.8/site-packages/py4j/protocol.py:326: in get_return_value
    raise Py4JJavaError(
E   py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
E   : org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@koalas-1.2.1-0-0-1-3503248195:37093
E   	at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
E   	at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)
E   	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
E   	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
E   	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:34)
E   	at org.apache.spark.executor.Executor.<init>(Executor.scala:206)
E   	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
E   	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
E   	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
E   	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
E   	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
E   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
E   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
E   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
E   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
E   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
E   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E   	at py4j.Gateway.invoke(Gateway.java:238)
E   	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
E   	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
E   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
E   	at java.lang.Thread.run(Thread.java:748)

@ueshin were you able to test it locally? Would you be able to help debug this failure? I just added you as a maintainor of this repo as well.

@saulshanabrook sure, I will work on it soon.
When I build the docker file, actually I couldn't download the parent image so I just used the "base" image in the repository.

When I build the docker file, actually I couldn't download the parent image so I just used the "base" image in the repository.

Ah yeah, thanks for bringing this up, just opened an issue: #91