Distributed Execution
bubbajoe opened this issue · comments
Hello,
I am very new to rust so please bare with me.
So I would like to make a query on a large amount of data (50 GB of Parquet files) across multiple executors. But I am wondering how ballista handles this. Can it execute heavy loads like this even if node running it will only have 16 GB of memory.
-
How can I determine the memory required for an execution plan?
-
Does ballista execute a single query on multiple executors? If not, how can I optimize this?
- I'm not sure how you would determine the appropriate amount of memory without just trying it out. Ballista by no means loads all 50GB into memory at the same time - it breaks it up into smaller RecordBatches for processing.
- Ballista will run your query on as many executors as it can successfully parallelize (likely as many as you give it, depending on the query).