groupon / luigi-warehouse

A luigi powered analytics / warehouse stack

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feedback on Luigi host_resources

bynr opened this issue · comments

commented

Hi there, (@Benyuel maybe?)

it seems you are using a version of that feature in Luigi: spotify/luigi#1669
If so, do you have any feedback on running it in production?

I would be very interested that this feature be merged to luigi, so any remarks is welcome!

ps Btw, sorry for the fake issue, I thought it would be the most straightforward way to contact you, so feel free to close it.

@BinrOp: We're running it in production somewhat ad hoc. We use it to parallelize workflows more when jobs start to get slow. As an example, by using host resources we can limit what each of our hosts can run simultaneously so not to 'overload' any one specific host. But by doing that we can also share the workload between various hosts by kicking off the same task/workflow on multiple identical hosts without restarting the entire workflow. We don't use it too often, but the trick I've found is to make the workflow capable of running in parallel and standardizing each luigi client host.

commented

Thanks for taking the time to answer! :D

Is standardizing each luigi client host really necessary? I tried to specify different host resources in the luigi.cfg read by the client for each host and resources were still distributed as expected. Maybe I misunderstood what you meant?

To summarize, this feature is a really great for scaling luigi jobs!

Standardizing each luigi client host isn't necessary from a task scheduling / resource distribution perspective.
It was more from a task development / parallelizing perspective. A simple high level example would be if you had a workflow that interacted with the local file system, and you wanted to have the capability to add workers from different hosts as part of that whole workflow. The easiest way to make that work is for each host to have the same file system structure / data, so that your task runs successfully with the local file system on any of your available hosts.