StageSkewAnalyzer: Arithmetic Exception: division by zero
enaggar opened this issue · comments
I'm receiving this error running sparklens on spark history file
Failed in Analyzer StageSkewAnalyzer
java.lang.ArithmeticException: / by zero
at com.qubole.sparklens.analyzer.StageSkewAnalyzer$$anonfun$computePerStageEfficiencyStatistics$3.apply(StageSkewAnalyzer.scala:109)
at com.qubole.sparklens.analyzer.StageSkewAnalyzer$$anonfun$computePerStageEfficiencyStatistics$3.apply(StageSkewAnalyzer.scala:90)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at com.qubole.sparklens.analyzer.StageSkewAnalyzer.computePerStageEfficiencyStatistics(StageSkewAnalyzer.scala:90)
at com.qubole.sparklens.analyzer.StageSkewAnalyzer.analyze(StageSkewAnalyzer.scala:33)
at com.qubole.sparklens.analyzer.AppAnalyzer$class.analyze(AppAnalyzer.scala:32)
at com.qubole.sparklens.analyzer.StageSkewAnalyzer.analyze(StageSkewAnalyzer.scala:27)
at com.qubole.sparklens.analyzer.AppAnalyzer$$anonfun$startAnalyzers$1.apply(AppAnalyzer.scala:91)
at com.qubole.sparklens.analyzer.AppAnalyzer$$anonfun$startAnalyzers$1.apply(AppAnalyzer.scala:89)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
at com.qubole.sparklens.analyzer.AppAnalyzer$.startAnalyzers(AppAnalyzer.scala:89)
at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:168)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.ReplayListenerBus.postToAll(ReplayListenerBus.scala:35)
at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:85)
at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.qubole.sparklens.app.EventHistoryReporter.<init>(EventHistoryReporter.scala:38)
at com.qubole.sparklens.app.ReporterApp$.parseInput(ReporterApp.scala:54)
at com.qubole.sparklens.app.ReporterApp$.delayedEndpoint$com$qubole$sparklens$app$ReporterApp$1(ReporterApp.scala:27)
at com.qubole.sparklens.app.ReporterApp$delayedInit$body.apply(ReporterApp.scala:20)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.qubole.sparklens.app.ReporterApp$.main(ReporterApp.scala:20)
at com.qubole.sparklens.app.ReporterApp.main(ReporterApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
In the output I can see total number of cores available = 10, and total number of executors = 11, what could be the cause of this?
This leads to the executorCores variables to be equal to zero, which leads to the issue above.
Thanks for raising this issue. I will check and revert back shortly. Are you using dynamic allocation / autoscaling of executors?
Getting the same error when runing on Databricks notebook
EfficiencyStatisticsAnalyzer, StageSkewAnalyzer both throw this error in a jupyter notebook and it seems to have the same cause. AppContext.getMaxConcurrent, the maxConcurrent never get's higher that 0 in those cases.
We don't use dynamic memory allocation of the executors.
@iamrohit Thanks for fixing this ;)