Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION
astubbs opened this issue · comments
While running and publishing messages back to Kafka (pollAndProduce), sometimes a transaction error occurs in the log:
java.lang.IllegalStateException: Cannot call send in state COMMITTING_TRANSACTION
This is due to some error in the way transaction state is managed / monitored by the system.
Also, as reported by a user:
It doesn’t get stuck on Cannot call send in state
COMMITTING_TRANSACTION
, it just re-processes.
However, sometimes I get Invalid transition attempted from state READY to stateCOMMITTING_TRANSACTION
on startup and then an infinite loop occurs. I’ve only seen this happening when I setmax.poll.records
very low.
I've contributed a small test-case that reproduces some of the issues we discussed on slack: JorgenRingen@9d5fd91
By tweaking the parallel-consumer and kafka-consumer settings I can reproduce Invalid transition attempted from state READY to state COMMITTING_TRANSACTION
. Should happen on maxPolledRecords .
Also reproduced messagesProcessed > messagesProduced and Cannot call send in state COMMITTING_TRANSACTION
(as the test is now).
I get somewhat different behavior by tweaking parallel-consumer options and the max.poll.records
. It's not completely deterministic.
Afternoon @JorgenRingen, I believe I have fixed the issue with transactions / commit state, and now made transactions optional as well: https://github.com/confluentinc/parallel-consumer/pull/31/files
Take a look at the interface and let me know what you think - specifically this choice: https://github.com/confluentinc/parallel-consumer/pull/31/files#diff-12c3d1f966a5367ab47be49f7c0f11a9dcd4cf339b73acb092b8c44ae9c9a6e2R45
Evening @astubbs, great with optional transactions. Adding the parameter and verifying by IllegalArugmentException'ing seems like a pragmatic and fair approach to me. However, if introducing support for optional producer, the parameter might be a little confusing. Don't have any immediate ideas on how to improve. Maybe some "fluent style" perhaps (which might be totally overkill) like ParallelConsumerOptions.with[outProducer|NonTransactionProducer|TransactionalProducer]().numberOfThreads(...).maxConcurrency(...)
I actually found a bug in the test:
https://github.com/confluentinc/parallel-consumer/pull/31/files#diff-d6d31b42ae96a5e31f2793c52624720af3a084c434707a9f031969a7af1b4e14R96 <- this line would always override maxPollRecords instrumented by the tests.
I deleted the line, added a couple of more tests and better verification and error-messages.
In my branch the following tests now fail fairly consistently:
- io.confluent.parallelconsumer.examples.core.Bug25AppTest#testTransactionalLowMaxPoll (1) (infinite loop on every run)
- io.confluent.parallelconsumer.examples.core.Bug25AppTest#testTransactionalDefaultMaxPoll (500) (infinite loop about ~50% of the times)
Typically infinite loops occurs when processed - produced=16
(16 = default number of threads)
All non-tx tests works and tx works when running with maxPollRecords=10000.
Checkout updated test:
JorgenRingen@be57d92