qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Home Page:http://sparklens.qubole.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scalability aware Autoscaling with Sparklens

beriaanirudh opened this issue · comments

With repetitive workloads (such as ETLs), Sparklens can leverage the knowledge of resource-requirements from previous runs of a spark application, and use it to autoscale executor requirements such that the same latency of spark application is met with the minimum executors needed at every job. This provided all other configurations of the application remain same.

This can be done by the following:

  1. One the first run on an app, the Sparklens-json will contain all the information regarding this need. We will now show graphs showing the actual executors scaling Vs the minimum executor autoscaling in which the same latency of app can be achived. This minimum number is per-job-basis for the application.
  2. When the same app is run again, user can pass the Sparklens-json from the previous run, and another configuration to let Sparklens dictate autoscaling of executors for this run.