opensearch-project / data-prepper

Describe the bug
we currently update source coordination ownership for partitions synchronously in pull based sources like S3, OpenSearch, and DynamoDB. This happens in a loop approximately every 2 minutes, but when the buffer is very full, we spend time retrying to write to the buffer, which leads to expiring ownership of the partition, and reprocessing of that partition by another node of Data Prepper

Expected behavior
Asynchronously update ownership every 2 minutes without depending on the primary loop. For example, this is done here for DynamoDB (

data-prepper/data-prepper-plugins/dynamodb-source/src/main/java/org/opensearch/dataprepper/plugins/source/dynamodb/export/DataFileLoader.java

Line 206 in a20756c

    
           if (System.currentTimeMillis() - lastCheckpointTime > DEFAULT_CHECKPOINT_INTERVAL_MILLS) {

). We should update ownership in a timely manner regardless of how long it takes to write to the buffer.

Alternative consideration
Increase the ownership timeout to be a higher value or check ownership updates in between attempts to write to the buffer

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Ubuntu 20.04 LTS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Maybe we can just have buffer accumulator take in a callback that runs when the buffer times out.

@graytaylor0 ,

I'm not sure how that would be different. When it times out, doesn't the current thread continue and then iterate back to getting ownership? Or is there something in between?

If I understand the problem correctly, the ownership is expiring during the write to the buffer.

@dlvenable Buffer accumulator currently will block and retry internally here (

data-prepper/data-prepper-plugins/buffer-common/src/main/java/org/opensearch/dataprepper/buffer/common/BufferAccumulator.java

Line 73 in ef39d4f

flushWithBackoff();

). So I was thinking the callback would run in between the backoff retries at some point here (

data-prepper/data-prepper-plugins/buffer-common/src/main/java/org/opensearch/dataprepper/buffer/common/BufferAccumulator.java

Line 93 in ef39d4f

flushedSuccessfully = flushBufferFuture.get();

)

[BUG] Ownership can timeout on full buffer for pull based sources