qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Home Page:http://sparklens.qubole.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EventHistoryToSparklensJson doesn't work with local or HDFS events file/directory

Harshit22 opened this issue · comments

EventHistoryToSparklensJson class treats input events file argument as local file or directory. However, EventHistoryReporter class, used internally, reads it as HDFS file.

This makes both local and HDFS events file unusable with EventHistoryToSparklensJson.
Doc mentions that input file should be local path.

To circumvent this issue, I had to keep events file in both local and HDFS filesystems at identical paths.

Jar used: https://mvnrepository.com/artifact/qubole/sparklens/0.3.1-s_2.11
Java 8/Scala 2.11/Spark 2.4.3/AWS EMR

@Harshit22 it works for local files and directories, assuming the "local" doesn't have HDFS setup. Are you running it from one of the cluster machines with HDFS configured?
The primary reason to support HDFS was to ensure that while running sparklens with spark application, one can save the sparklens json file to known s3 or HDFS location, which is useful if one doesn't have ssh access to machine running the driver.