ArroyoSystems / arroyo

Distributed stream processing engine in Rust

Home Page:https://arroyo.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The UI should allow users to change parallelism

mwylde opened this issue · comments

Pipelines have an associated parallelism configuration that controls how many parallel subtasks we run for each operator. (Inside the dataflow itself, we support operators have different parallelism, but in the current API we only allow setting a single parallelism across the entire job).

This parallelism is set to an inferred value at pipeline creation, however that may be too high or low depending on the actual data volume and complexity of the query.

There is a gRPC API (UpdateJob) that allows users to change the parallelism of a running job, but it is not currently exposed on the Web UI.

This issue covers adding the ability to change the parallelism from the job details page (http://localhost:8000/jobs/{job_id}).

Note that because we do not currently support dynamic rescaling of pipelines, changing the parallelism triggers this sequence in the controller:

  1. The job is stopped with a final checkpoint
  2. The existing workers are shut down
  3. A new job is scheduled with the new parallelism

This can take several seconds, and the UI should reflect that the change is rolling out.