qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Home Page:http://sparklens.qubole.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

formula for ideal executor plot

chen116 opened this issue · comments

Hi,

First of all, amazing project!

From the report generated from http://sparklens.qubole.com/ , I see the ideal executor plot where it plot "the minimal number of executors (ideal) which could have finished the same work in same amount of wall clock time"

I am curious what are the formulas, equations for such plot. If you can give me some explanation on how you guys approach it, that would be great. Thanks!

Hi @chen116,

Thanks for the wonderful words. The calculation you are referring to comes from simulation. It works just like the completion time vs efficiency graph at different executor counts. Instead of simulating the complete application, here we simulate each spark job individually. To find the minimal number of executors required for a particular job, we do a binary search between 1 and total number of executors. This graph is basically trying to show if autoscaling would be useful and second to judge how a given autoscaling policy is doing compared to "ideal" autoscaling.

We will be talking more about it here: https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/74183

thanks!