`schema_registry_encode` Race Condition
hendoxc opened this issue · comments
Hey getting a very strange bug using schema_registry_encode
I sometimes see my application intermittently failing to encode some % of messages, I check the logs to see why, and I see
could not decode any json data in input
followed by the json message that its trying to encode then for key xyz
okay fine makes sense, but when I actually look at the json input that was printed out in the error message, I can see that everytime the first {"key_name":
is missing/truncated for some reason.
for example if was receiving message like
{"key_1": "value", "key_2": 12}
the error message would look like:
cannot decode textual record \"com.data\": could not decode any json data in input "value", "key_2": 12} for key "key_1"
with the {"key_1":
truncated from the json
I know the message in the benthos pipeline is fine, because Im catching the error, logging the message that caused the error, and sending the message to a DLQ to inspect further, where the json is fine, and contains all the expected keys.
This happens on a high throughput topic, and I find that the error tends to occur more often based on the refresh period set in schema_registry_encode
, the smaller the refresh period, the more often the application will intermittently throw these errors for a % of messages, then return to normal mode of operation.
Hey @hendoxc, I'm taking a look for any race conditions within the processor itself, it'd also help if you can give a general overview of what your config is doing, including:
- The input type that is producing the messages that are failing
- Any processors that come specifically before the schema registry processor
- Whether any custom plugins are a part of the build you're running
I managed to find a race condition and have a fix for it: 3c301bb
Let's call this a speculative fix, are you able/willing to run a nightly build to try this out?
yes, can pull in the latest benthos commit and try it out.
input: kafka_franz
as for the processors before hand, I'm just doing a schema_registry_decode
, then some mapping
processors .
@Jeffail I've been running a couple applications with the commit SHA provided, and no longer saw the issue, I'd say this is resolved.
Awesome, thanks @hendoxc!