apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

Home Page:https://datafusion.apache.org/ballista

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[feature] Increase the load balancing strategy of the machine

smallzhongfeng opened this issue · comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Since the current Slots allocation strategy only compares the remaining slots of each executor when sorting, we can add a load balancing comparison method, because the models of each executor are different, and it is possible that machines with better performance can run more task

Describe the solution you'd like
I think our better way is to obtain cpu, memory, disk and other statistical information to make a weighted average, and then get a score, based on this score to determine the priority of assigning tasks to a certain executor

How about using different task slots for different executors?

Sorry.... I can't get your point. Could you be more specific about this strategy?

Sorry.... I can't get your point. Could you be more specific about this strategy?

I believe he means that the number of task slots is configured on a per-executor basis. When the executor registers with the scheduler it informs the scheduler how many available task slots it has and the scheduler tracks task slots at the executor level so there is no need for all executors to have the same number of slots. If you have executors which can handle more concurrency then you can just configure those executors to register with more task slots.

Thanks for your reply! I got it, but let me give an extreme example, if executor1 on host1 has one core and is configured with 10 slots, and another executor2 on host2 has 10 cores and is configured with 2 slots, will executor1 be given priority to allocate 10 slots, but according to In terms of machine performance, executor2 should be assigned with higher priority, right? @thinkharderdev @yahoNanJing

@smallzhongfeng

So the general idea is to allocate slots in accordance to the number of cores:

executor1 has 1 core -> configure this executor with 1 slot
executor2 has 10 cores -> configure this executor with 10 slots

The executors themselves report back on completed tasks, so if a slot is free it will be filled whenever a new task is available.

@smallzhongfeng

So the general idea is to allocate slots in accordance to the number of cores:

executor1 has 1 core -> configure this executor with 1 slot executor2 has 10 cores -> configure this executor with 10 slots

Yes, if I have read the code, I will configure it like this, but for users, if they don’t care about the configuration, I think we can have an adaptive ability, for example, we can automatically obtain the number of cores of the current machine, and then set A ratio is obtained by dividing the configured slot and the number of cores, and the priority task with a higher ratio is assigned to this batch of executors for execution.

If we have many machines, but different models, it is impossible for us to manually modify the configuration.

If we have many machines, but different models, it is impossible for us to manually modify the configuration.

The executor will by default use the number of available cores for it's concurrent task slot configuration, so you shouldn't need to do any special configuration if you're okay with that default.

#832 (comment)

What do you think of this proposal? @thinkharderdev

#832 (comment)

What do you think of this proposal? @thinkharderdev

Seems reasonable

Could I raise a pr about it ? :-)

If we have many machines, but different models, it is impossible for us to manually modify the configuration.

This part can be done at the executor start up script. In the script we can get the node cpu core if you want and then set the config. As for me, the cpu core number is a redundant part for the resource. And what's more, in some case, multiple executors may reside on one machine. Then it's not good to depend the node cpu core number for task assignment.

This part can be done at the executor start up script

Good idea!

As for me, the cpu core number is a redundant part for the resource. And what's more, in some case, multiple executors may reside on one machine. Then it's not good to depend the node cpu core number for task assignment.

With multiple executors on one machine, this problem is indeed not solved. But for one machine and one executor, I understand that adding cpu cores will be more accurate.

adding cpu cores will be more accurate

I don't think so. The executor slot is a virtual concept for describing how many tasks can be handled concurrently for an executor. It depends on the cpu core number. However, they are different. For some machine, it may be better to assign one tasks for a core. While for others, it may be better to assign two or three. It depends on whether the task is cpu bound or io bound. Therefore, here we should use a more general one to indicate the task concurrency.

Actually, on our production environment, we leverages the memory size rather than the cpu core number to indicate the task concurrency. The idea is similar to the one used by the Hadoop systems.

I agree with @yahoNanJing here. Having a default setting where task slots = cpu cores is a reasonable default, but different workloads have different constraints. Sometimes it is CPU, sometimes it is memory (eg if you do a lot of joins and high-cardinality aggregations) and could even be something like network bandwidth or disk size. Trying to "derive" the task slots from some static config values might just get confusing and complicated so seems like a more maintainable approach is to just have a sensible, easily explainable default and then let users configure different task concurrency based on their use case and whatever parameters they want to include. As mentioned this should be easily accomplished by having a script which picks the right task concurrency and then just runs the executor binary passing the correct value to the existing config.

That's ok. I can understand now, and thank you very much for your answers, then I will close this issue first.