MaxTaskMem 0.0 KB allways

Question

MaxTaskMem 0.0 KB allways

kretes opened this issue 6 years ago · comments

Hello there! thanks for the tool, looks promising. nice thing that it works on historical events data!

I tried that and got a 'Max memory which an executor could have taken = 0.0 KB'. and everywhere a 0.0 KB in the 'MaxTaskMem' column which is obviously wrong since the job ran and used some memory.
Other metrics look at least sane.

I wonder what can be the case?

My ultimate goal is to find out how much can I lower the memory footprint of a job and do not get OOM, this 'MaxTaskMem' field looks like a good fit for that, right?

Rohit Karlupia · Answer 1 · Wed Sep 19 2018 01:30:35 GMT+0800 (China Standard Time)

Thanks @kretes for trying this out. We have seen this issue but never debugged it. I took a look. Looks like the JsonProtcol.scala, the main class responsible for converting events to JSON (which gets logged into event log files) is not handling the peak execution memory metric (part of TaskMetrics). We can try to get this patched into open source, but that will take its own sweet time.

For now, the only useful way to get this value will be to run sparklens as part of your application. It is possible to configure sparklens to not do simulations. In this mode, it will save a sparklens output file (typically few MBs in size). You can use this output file just like event log history files and get the same out (with simulation) at any later point in time.

Thanks again for sharing your findings.

Tomasz Bartczak · Answer 2 · Thu Sep 20 2018 18:21:31 GMT+0800 (China Standard Time)

Thanks @iamrohit - it works when run together with the diagnosed job.

Still I wonder how should I interpret the MaxTaskMem.
In the code sparklens is doing such a thing:

val maxTaskMemory = sts.taskPeakMemoryUsage.take(executorCores.toInt).sum // this could
// be at different times?

And if it is a sum of that many peakMemoryUsages as many executorCores are there - then it is not a 'task' metric, as one task goes on one core, right?

I got the result of 16 GB in one stage, and given executor has that much memory - it is not available to a task, if a few of them are running concurrently on the same executor.

Rohit Karlupia · Answer 3 · Thu Sep 20 2018 22:28:42 GMT+0800 (China Standard Time)

MaxTaskMem is very wrong name! Let me first take this as an action item.

What is MaxTaskMem?
Executor memory is divided into user-memory and spark-memory. The spark-memory is further divided into native and on heap and also into storage and execution. For every task, spark reports maximum (peak) execution memory used by a task. The goal of MaxTaskMem is to figure out what is the worst case execution memory requirement per stage for any executor. We compute this number by first sorting all tasks of a stage by peakExecutionMemory and pick top N values, where N is number of cores per executor. Essentially if all largest tasks of a stage end up running on a single executor, how much execution memory they will need.

Usually by default spark allocates around 60%/70% of memory for execution+storage. This means, if we pick the max value of MaxTaskMem across all stage, and say double it, we are certain that in no stage, any possible combination of N tasks (where N = number of cores per executor) can cause executor to run out of execution memory. Note that we are not accounting for user-memory, so it doesn't really means no OOM. That still remains a possibility. This probably means no spill. But again, spill is mostly harmless. May be slows down, but will not kill the executor.

Please take a look here for some imore information. https://www.qubole.com/blog/an-introduction-to-apache-spark-optimization-in-qubole/

I will probably be raising a pull request for a different feature which might be useful for memory management in spark. We call it GC aware task scheduling. Essentially instead of right-sizing the memory per executor, this feature changes the number of active cores per executor dynamically, thus making memory available for huge tasks at the expense of some loss of concurrency. Hopefully this will settle the question of memory tuning spark once for all.

Tomasz Bartczak · Answer 4 · Fri Sep 21 2018 16:59:32 GMT+0800 (China Standard Time)

Hello!

Thanks for the thorough explanation.

I would then call this metric 'WorstCaseExecutorMemUsage'. And this is actually what I am looking for. Although in my case the worst case didn't happen at all since it was way more than executor memory.
I will be looking at the result of this metric in a few more jobs.

The feature you are describing sounds like a good thing to do! Looking forward to it!

Anupam Aggarwal · Answer 5 · Tue Jul 09 2019 05:34:14 GMT+0800 (China Standard Time)

@iamrohit - Would the calculation of maxMem in StageSkewAnalyzer.scala still be valid in case there are different (independent) stages which are executing in parallel. Currently sparklens looks at stage level maxMemory and then finds global maximum - What if there are are 2 or more tasks belonging to different stages which get scheduled on the same executor. In this case should we take sum of (vcore number of ) tasks with highest peakExecution memory?

Rohit Karlupia · Answer 6 · Mon Jul 15 2019 12:57:32 GMT+0800 (China Standard Time)

You are right @anupamaggarwal. We should probably change the logic to account for parallel stages. What we have seen in practice is that this worst case is typically very unlikely and even if this occurs and executor goes out of memory, the execution typically proceeds as some of the tasks will go to other executors on retry. Feel free to raise PR.

tonyye · Answer 7 · Tue Apr 12 2022 08:09:23 GMT+0800 (China Standard Time)

I know this a very old issue, but are there any plans to fix this? This is an extremely useful metric to have.