ArroyoSystems / arroyo

Distributed stream processing engine in Rust

Home Page:https://arroyo.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

why in tumbling_local_aggregator module, rx not equal to tx?

javacoderxxp opened this issue · comments

In First pipeline case, why tumbling_local_aggregator_6 module events rx = 92eps, but tx = 32eps?
When I copy the pipeline, make nexmark source as 10000/s, tumbling_local_aggregator_6 module events rx = 9242eps, but tx = 641eps?
image
image

Hi there! Thanks for the question. What's happening is that the tumbling local aggregator does some partial aggregation before sending data onto the downstream node. As that edge is a shuffle, it will, in general, cross a network boundary, requiring serialization and deserialization. This is similar to the combiner that you see in some map-reduce systems, including Hadoop.

Hope this answers your question!

Hi there! Thanks for the question. What's happening is that the tumbling local aggregator does some partial aggregation before sending data onto the downstream node. As that edge is a shuffle, it will, in general, cross a network boundary, requiring serialization and deserialization. This is similar to the combiner that you see in some map-reduce systems, including Hadoop.

Hope this answers your question!

Thank u.