[BUG] Ownership can timeout on full buffer for pull based sources
graytaylor0 opened this issue · comments
Describe the bug
we currently update source coordination ownership for partitions synchronously in pull based sources like S3, OpenSearch, and DynamoDB. This happens in a loop approximately every 2 minutes, but when the buffer is very full, we spend time retrying to write to the buffer, which leads to expiring ownership of the partition, and reprocessing of that partition by another node of Data Prepper
Expected behavior
Asynchronously update ownership every 2 minutes without depending on the primary loop. For example, this is done here for DynamoDB (
Alternative consideration
Increase the ownership timeout to be a higher value or check ownership updates in between attempts to write to the buffer
Screenshots
If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
- OS: [e.g. Ubuntu 20.04 LTS]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.
Maybe we can just have buffer accumulator take in a callback that runs when the buffer times out.
I'm not sure how that would be different. When it times out, doesn't the current thread continue and then iterate back to getting ownership? Or is there something in between?
If I understand the problem correctly, the ownership is expiring during the write to the buffer.
@dlvenable Buffer accumulator currently will block and retry internally here (