qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Home Page:http://sparklens.qubole.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How does source=history option work?

Shasidhar opened this issue · comments

I am trying to run sparklens on event logs of my application.

I am using following command

./bin/spark-submit \
	--packages qubole:sparklens:0.2.0-s_2.11 \
	--master local[0] \
	--class com.qubole.sparklens.app.ReporterApp \
	qubole-dummy-arg file:///Users/shasidhar/interests/sparklens/eventlog.txt source=history

I see following output in console

Ivy Default Cache set to: /Users/shasidhar/.ivy2/cache
The jars for the packages stored in: /Users/shasidhar/.ivy2/jars
:: loading settings :: url = jar:file:/Users/shasidhar/interests/spark/spark-2.3.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
qubole#sparklens added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
	confs: [default]
	found qubole#sparklens;0.2.0-s_2.11 in spark-packages
:: resolution report :: resolve 177ms :: artifacts dl 5ms
	:: modules in use:
	qubole#sparklens;0.2.0-s_2.11 from spark-packages in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
	confs: [default]
	0 artifacts copied, 1 already retrieved (0kB/6ms)
2019-01-03 15:46:11 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Local jar /Users/shasidhar/interests/spark/spark-2.3.0-bin-hadoop2.7/qubole-dummy-arg does not exist, skipping.

2019-01-03 15:46:52 INFO  ShutdownHookManager:54 - Shutdown hook called
2019-01-03 15:46:52 INFO  ShutdownHookManager:54 - Deleting directory /private/var/folders/3t/rfd2djjs1yg30mhmw8z_s7tw0000gp/T/spark-7a992110-6a4f-44f4-9473-1ddade11b53a

What exactly I need to look at after this? Does it generate sparklens json file? If yes, where I can see the output file?

Hi @Shasidhar,

I will expect this to print usual sparklens report on the console. We don't really support converting event history file to sparklens json yet (will be adding soon). Here is how we generate sparklens.json from a running application.

--packages qubole:sparklens:0.2.0-s_2.11
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
--conf spark.sparklens.reporting.disabled=true
--conf spark.sparklens.data.dir=/dir/for/saving/sparklens.json

@iamrohit Understood, I think for some reason I don't see the report then

@Shasidhar May be something wrong with your event log file? Can you try running with this file [sparklens/src/test/event-history-test-files/local-1532512550423] and check if you still don't get any results?

@iamrohit Yes looks like an issue with my event logs. WIll figure it out thanks. Is there an issue or something which I can follow for the feature which will generate the sparklens.json file from event logs?