qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Home Page:http://sparklens.qubole.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error parsing spark 2.2.0 event logs

grantnicholas opened this issue · comments

Certain spark 2.20 event logs (dumped from Qubole) cannot be parsed with this error:

18/08/20 16:52:05 ERROR ReplayListenerBus: Malformed line #2: {"Event":"org.apache.spark.scheduler.SparkListenerAMStart","containerId":"container_1534754524533_0006_01_000001","hostname":"$REDACTED"}
java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListenerAMStart
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
	at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:521)
	at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.qubole.sparklens.app.EventHistoryReporter.<init>(EventHistoryReporter.scala:33)

Spark tries to lookup the event class from the event type in the json file, but it encountered an event type => class mapping that is not found. I could not find any references to SparkListenerAMStart in spark's source code so I'm curious how Spark could have generated this event.

My guesses as to what happened. Either:

  • Spark compatability issues between old versions of spark and new versions of spark. The event log was written with Spark 2.2 but the parser is assuming Spark 2.0. IE) The deserialization code from Spark 2.0 does not have the proper mapping to handle Spark 2.2 events.
  • Some qubole-custom extension to Spark is writing out these non-standard events.

Hi Grant, thanks for reporting this.

Yes, this event is specific to Qubole, and there are other events which Sparklens doesn't need to consider currently. So, we ignore these events.

Can you share what command you used to run Sparklens reporting from event-history file? Also, what was the version of Spark binaries you used when you ran Sparklens reporting? Ideally binaries of all versions from spark-2.0* onwards should be able to report.

I built sparklens from source (the master branch of this repo, commit hash: b0d5295).

By default the master branch sbt file uses spark version 2.0.0.

Notably I did not get this error when using the pre-built binaries (qubole:sparklens:0.2.0-s_2.11). Instead, I got a different error with dividing by zero. In order to debug the issue, I tried building from source and that is when I encountered this event parsing issue.

Okay.

  1. Yes, the master branch uses 2.0.0, but it is a "provided" dependency, i.e. it will use the spark-jars/classes from spark binaries one uses. Standalone sparklens jar would not have spark binaries. As pointed out previously, we have tried to keep usage of sparklens possible for any Spark version 2.0* onwards.
  2. You can create an issue with the division by 0 bug. And you are most welcome to raise a fix-PR for that as well. :)

Missed to add this: The release qubole:sparklens:0.2.0-s_2.11 is made from sparklens master, commit hash 5f7ba57 , which is basically the hash you mentioned minus READme changes. Not sure why you see that error. You can try the following:

  1. sbt clean package
  2. Making sure spark-binaries are spark version 2.0 and beyond.

Gotcha, so just to confirm it sounds like the only way to read a qubole-generated spark event log file is to use qubole-spark binaries. (Since the qubole-generated spark event log will have Qubole-specific events in it that are not parsable by standard spark binaries).

I was trying to run sparklens locally on my laptop (using standard spark 2.0.0 binaries, not qubole-spark binaries) which would explain this issue.

One possible workaround is to blacklist nonstandard qubole-specific events from the file/inputstream being fed to the spark parser.
Another workaround is to document that qubole-generated spark event logs must be run with qubole spark binaries in order to parse correctly.

The first suggestion you mentioned sounds correct, and is already implemented. Please check method EventHistoryReporter.getFilter in Sparklens code-base.

Yes but getFilter is only called if NoSuchMethodException is thrown:
https://github.com/qubole/sparklens/blob/master/src/main/scala/com/qubole/sparklens/app/EventHistoryReporter.scala#L35-L39

I am getting a java.lang.ClassNotFoundException exception because the class SparkListenerAMStart could not be found.

This is because internally that replay method calls JsonProtocol.sparkEventFromJson which calls Utils.classForName(other) where other is SparkListenerAMStart.

Got your point. So basically if the following occur, we get this ClassNotFoundException:

  1. Spark binaries are version 2.0
  2. An event not known to those spark binaries is seen (like SparkListenerAMStart)

This seems a miss on dev-side. Actually I mis-read spark-2.0 code where it was handling ClassNotFoundException only for few classes, thinking this would be handled for all https://github.com/apache/spark/blob/branch-2.0/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala#L75

However for spark versions greater than 2.0, this will not be a problem. This can be fixed. If you have something in mind, please do raise a PR.