benthosdev / benthos

Fancy stream processing made operationally mundane

Home Page:https://www.benthos.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`schema_registry_encode` Race Condition

hendoxc opened this issue · comments

Hey getting a very strange bug using schema_registry_encode

I sometimes see my application intermittently failing to encode some % of messages, I check the logs to see why, and I see

could not decode any json data in input

followed by the json message that its trying to encode then for key xyz

okay fine makes sense, but when I actually look at the json input that was printed out in the error message, I can see that everytime the first {"key_name": is missing/truncated for some reason.

for example if was receiving message like

{"key_1": "value", "key_2": 12}

the error message would look like:

cannot decode textual record \"com.data\": could not decode any json data in input "value", "key_2": 12} for key "key_1"

with the {"key_1": truncated from the json

I know the message in the benthos pipeline is fine, because Im catching the error, logging the message that caused the error, and sending the message to a DLQ to inspect further, where the json is fine, and contains all the expected keys.

This happens on a high throughput topic, and I find that the error tends to occur more often based on the refresh period set in schema_registry_encode, the smaller the refresh period, the more often the application will intermittently throw these errors for a % of messages, then return to normal mode of operation.

Hey @hendoxc, I'm taking a look for any race conditions within the processor itself, it'd also help if you can give a general overview of what your config is doing, including:

  • The input type that is producing the messages that are failing
  • Any processors that come specifically before the schema registry processor
  • Whether any custom plugins are a part of the build you're running

I managed to find a race condition and have a fix for it: 3c301bb

Let's call this a speculative fix, are you able/willing to run a nightly build to try this out?

yes, can pull in the latest benthos commit and try it out.

input: kafka_franz

as for the processors before hand, I'm just doing a schema_registry_decode, then some mapping processors .

@Jeffail I've been running a couple applications with the commit SHA provided, and no longer saw the issue, I'd say this is resolved.

Awesome, thanks @hendoxc!