[IMPROVEMENT] Speed up replica rebuilding by reuse the last transferred data

Question

[IMPROVEMENT] Speed up replica rebuilding by reuse the last transferred data

PhanLe1010 opened this issue 2 months ago · comments

Is your improvement request related to a feature? Please describe (👍 if you like this request)

Longhorn already has the fast replica rebuilding feature which is great. It relies on the checksums of snapshot disk files which is calculated sometime after the snapshot is created and the rebuilding has finished (ref link)

However, the current feature fast replica rebuilding feature cannot solve the following use case:

Users create a volume of 1 replica
Users increase number of replicas to 2
Rebuilding progress reach 99% and failed
Longhorn doesn't try to skip transferring the last transferred data because it doesn't have a checksum yet. Longhorn would have to fallback to the mechanism:
1. Loop over each data chunk
2. Compare the checksum of data chunk on the source and on the rebuilding replica (the checksum calculation cost CPU)
3. Transfer the data if mismatch

Some ideas to help with the above use case:

Idea 1: (simple implementation - limited benefit)

Do the continuous checksum while transferring the data to the rebuilding replica on the rebuilding replica side
Once the snapshot file is transferred record the checksum immediately.
If the rebuilding replica failed, the next time we rebuild by reusing this replica, we can skip this snapshot
This only helps to speed up the process at the snapshot completion milestones

Idea 2: (complex implementation - more benefit)
@shuo-wu has an idea of:

Do the continuous checksum while transferring the data to the rebuilding replica on the rebuilding replica side
Record the continuous checksum in a separate file
If the rebuilding replica fails, next time we rebuild by reusing this replica, we can skip the transferred data region if it has the same checksum as the one recorded in the file above
This helps to speed up the process at the data chunk completion milestones

Additional context:

We need to investigate how trim feature can affect these ideas

cc @shuo-wu @ejweber @innobead @derekbit

Derek Su · Answer 1 · Wed Jun 12 2024 09:20:09 GMT+0800 (China Standard Time)

Should we issue a sync request for each rebuilt snapshot file or each data region?

PhanLe1010 · Answer 2 · Thu Jun 13 2024 02:38:48 GMT+0800 (China Standard Time)

Should we issue a sync request for each rebuilt snapshot file or each data region?

Is there any concern if we keep the current implementation of doing a sync gRPC for each snapshot file? I am proposing to modify the sparse tools (used by receiver launched by sync agent of the rebuilding replica) here https://github.com/longhorn/sparse-tools/blob/ed49dd3f93eb42590d1b57686572ee739106c099/sparse/rest/handlers.go#L245-L270 to calculate continuous checksum after each data/hole transfer completed