qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Home Page:http://sparklens.qubole.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the overhead of running sparklens with every job.

singhania opened this issue · comments

commented

I am planning to keep sparklens running for each job providing me statics for each job so that later on I can compare different metrics across different jobs.
Wanted to figure out what is the overhead of running the sparklens for a spark job.

@singhania Nothing much while the application is running. At the end of the application sparklens we do some simulations to estimate completion times with different executor counts. This part is multi-threaded but can still take time if the applications has very high number of tasks (say in millions). For most applications the simulation time will be few seconds but we have seen some large applications where it can go upto 2 minutes (largest observed value). Typically applications with large number of tasks tend to take long time to complete as well (say 2 minutes extra processing for an application than runs for 2 hours). So if you measure it as a %age of actual runtime, it will likely be safe to use sparklens all the time.

commented

Thanks for the quick response.