lovoo / goka

Goka is a compact yet powerful distributed stream processing library for Apache Kafka written in Go.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: Alternative for loop?

chin8628 opened this issue · comments

Hi, I have a problem with using a loop.

I found that when using the loop, both input and loop topics must have the same number of partitions.
However, if partitions of both topics are the same, then Goka tries to call DescribeConfig.

The problem is here, Kafka service at my company doesn't provide us permission. So every time DescribeConfig is called, the 'ErrTopicAuthoriazationFailed` is happened also.

I tried to use input-output edges instead of a loop, but I got this error should not directly use loop stream.

Do you have any idea?

Hi @chin8628,

you're right, within the processor scope, all inputs (and loop is an input too), have to be co-partitioned. There's no way around it. If you can't ensure that, multiple processors are needed.
Anyway.. there's a check to ensure that topic configurations are as required to protect the user from wrong configuration, which would go unnoticed and mess up the state and everything.
However, this check can be turned off, by setting the topic manager configuration to TMConfigMismatchBehaviorIgnore.
Second however: this will also not work in your case, because it does the call to DescribeConfig anyway.

Third however: the check should only be done, when goka attempts to create the topic, which it does for loop. So try that:

goka.NewProcessor(...
goka.DefineGroup(
goka.Output("proc-self-topic", codec),
goka.Input("proc-self-topic",codec,handler),
),
)

So basically defining the loop-topic yourself by defining an Output and Input with the same name. This also means you have to create it yourself. Not entirely sure if that works though, we should write a systemtest about it :)

Another way would be to omit the call to DescribeConfig if the topicmanager's config specifies to ignore its result anyway. We could try to implement that little change in the next couple of days.

What do you think?

Today, I tried to use input and output as the same name without applying a custom setting on its group. Also, I set the loop name's suffix to be another name to prevent directly use loop stream validation.

I found out that it skips calling DescribeConfig. So, it works well basically.
But I'm confused now. What is a condition to call DescribeConfig?

Another way would be to omit the call to DescribeConfig if the topicmanager's config specifies to ignore its result anyway. We could try to implement that little change in the next couple of days.

Honestly, I have no idea whether omitting the result of DescribeConfig is good or not because I'm new to Kafka.

But I'm confused now. What is a condition to call DescribeConfig?

The reason for this was actually an earlier bug, that created a topic with unfitting configuration. When a goka-Processor requires some kind of state (by specifying goka.Persist in the group graph), it creates a new kafka topic that is used store that state. That topic needs to be log-compacted, because retention-compaction would remove entries. The bug - mentioned above - created the topic without that log-compaction, which would go unnoticed and lead to data loss eventually, which was fatal in our case. To avoid that in future, we did two things:

  • obviously fix the bug
  • check for all topics which goka is responsible of creating to have correct configuration
    The second point is more of a safety measure to double-check, to catch other issues/bugs earlier. But for this one, we need to call DescribeConfig.

The point is, this is checked for all topics that goka creates. If you created the topics yourself, it's safe to turn it off. So we should be able to avoid the check for use cases where the call fails for whatever reason.

So... good to know the workaround works for you! But let's keep the issue open, until someone implements it - because it's easy to do and can save some time debugging.