gchq / sleeper

A cloud-native, serverless, scalable, cheap key-value store

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Asynchronous commit of compaction input file assignment

patchwork01 opened this issue · comments

Background

We perform state store updates for in the state store committer lambda. This is fed by a FIFO queue for arbitrary state store update requests. This uses the table ID as a message group ID, which ensures that each table is processed on a different, single instance of the lambda. Because a single instance handles each table, this should remove contention on the state store when all updates are applied through the lambda.

Description

We'd like to apply assignment of input files to compaction jobs in the state store committer lambda.

Analysis

We can add a table property to decide whether to use the old (synchronous) or new (asynchronous) behaviour to assign input files.

When the compaction job creation lambda creates jobs (in CreateCompactionJobs), instead of updating the state store directly, it can submit requests to the state store committer lambda.

We've split out a separate issue to support applying the file assignment in the state store committer lambda, which we can do before this issue:

Submit to compaction queue before/after input file assignment

We have the option to wait until after input file assignment to submit jobs to the compaction queue, but that would produce a couple of problems. The reason we submit the jobs to the compaction queue first right now is because we want to avoid a state where the input files are assigned to a compaction job, but the job is never submitted to the queue. In that case we can never unassign the file from the failed compaction job, and the file will never be processed without manual intervention.

Instead, we'd like to keep it as it is now with the compaction jobs going on the compaction queue before the input file assignment. The compaction job creation lambda will need to submit the jobs to the compaction queue, and then the file assignment commits to the state store committer queue.

Transaction batching

We could set it up so that a single invocation of the compaction job creation lambda creates a single request or a small number of requests for the state store committer lambda. We can assign input files for quite a large number of compactions in a single state store update.

This is already the behaviour of the compaction job creation lambda with synchronous updates. We'll want to produce a request to the state store committer that is similar to the current direct request to the state store.