ReplayMissedEvents Incorrectly assumes autoincrement value of 1 for all deployments.
stevend-uber opened this issue · comments
During a validation that we did for 1.9.6 Server, we noticed that we saw a single:
Detected skipped attested node event
And then constantly reoccuring logs:
Event not yet populated in database
.
After further discussion, it was found that the updateCache method assumes that all SQL stores use a value of 1
for the auto_increment_increment
value. In deployments where this is not true, this method will incorrectly populate the missedEvents map with all of the in-between entries that should never get populated.
This behavior will incur additional CPU cycles, Noise, and Memory Usage from storing the false-positives and doing the the replayMissedEvents poll. At the scale of 10's thousands per deployment and/or a highly dynamic environment, this will causes a new false-positive missed-event to happen every single time an entry is created.
From reading extensively, the system is using Group Replication, which means that each server is tuned to avoid using auto-indexed values that might be used elsewhere. This prevents the need to ensure that the same ID isn't issued in two different servers at the same time, as might occur when switching the primary write server.
In two servers, this would mean each generates ids with sequences that might resemble:
- server 1 - 1, 3, 5, 7, 9
- server 2 - 2, 4, 6, 8, 10
Under normal operations, only one server is generating IDs at a time, and to ensure better interoperability with programs that assume incrementing ids, on post-cutover most systems generate their "next" id higher than the last one they know was used.
So a sequence might look like 1, 3, 5, 7, 8, 10, 12 when a cutover occurs at the 7 to 8 transition. This means that there is no guaranteed skip pattern that can be followed to match the server, because there is no guarantee that the replication cluster didn't get a new primary write server after any specifically used ID.
The logic to ensure no ids are skipped will work without error as-is, so the concerns moving forward are about reducing resource consumption (the original goal of the entire effort). Without database replication, the logic will work very efficiently. With replication of this form, each added entry will suffer polling costs for 24 hours (the longest a transaction remains open on these platforms) and then will be efficient.
We are currently investigating suitable backoff algorithms that will still keep the main logic as-is with reduced polling for skipped ids that linger for longer than a few minutes. This should reduce the current cost of skipped ids for the first 24 hours, while preserving the zero maintenance performance costs of entries that are static beyond 24 hours.
@amartinezfayo Please assign this issue to me.
We will be looking at the implementation of a different algorithm for the events-based cache event tracking in #5624.
The new algorithm should solve existing issues related with the events-based cache. Once the algorithm described in #5624 is implemented, we can go back to this issue and figure out if any additional work is needed.