ArroyoSystems / arroyo

Distributed stream processing engine in Rust

Home Page:https://arroyo.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Runtime failure when using `source.offset` as `group` when starting a job for the first time

prakkashm opened this issue · comments

For a kafka source, when I specify source.offset as group with a non-existing source.group_id while starting a job for the first time, it fails during runtime without any errors. Possibly, the job is unable to create the new kafka consumer group.

Thanks for the report! This is actually the intended behavior, but could definitely be documented better.

'source.offset' = 'group' means that we should start from an existing consumer group (specified by source.group_id), rather than initial or latest. If there's no existing consumer group than I don't think there's a sensible behavior here, so we fail.

This feature exists for cases where you have an existing consumer group (for example, from an earlier arroyo job) and you want to pick up where that job left off. If you want to create a new consumer group, choose either 'earliest' or 'latest'.

Sure. As an end user, I would then expect an error on the UI about the non-existing consumer group.
I'm personally of the opinion, it should behave like earliest when the consumer group is non-existent. This helps in avoiding the additional manual overhead (of changing the earliest to group) that a user must do once if they need to restart an existing job to continue from where it stopped the first time.