longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes

Home Page:https://longhorn.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[IMPROVEMENT] Speed up replica rebuilding by reuse the last transferred data

PhanLe1010 opened this issue Β· comments

Is your improvement request related to a feature? Please describe (πŸ‘ if you like this request)

Longhorn already has the fast replica rebuilding feature which is great. It relies on the checksums of snapshot disk files which is calculated sometime after the snapshot is created and the rebuilding has finished (ref link)

However, the current feature fast replica rebuilding feature cannot solve the following use case:

  1. Users create a volume of 1 replica
  2. Users increase number of replicas to 2
  3. Rebuilding progress reach 99% and failed
  4. Longhorn doesn't try to skip transferring the last transferred data because it doesn't have a checksum yet. Longhorn would have to fallback to the mechanism:
    1. Loop over each data chunk
    2. Compare the checksum of data chunk on the source and on the rebuilding replica (the checksum calculation cost CPU)
    3. Transfer the data if mismatch

Some ideas to help with the above use case:

Idea 1: (simple implementation - limited benefit)

  1. Do the continuous checksum while transferring the data to the rebuilding replica on the rebuilding replica side
  2. Once the snapshot file is transferred record the checksum immediately.
  3. If the rebuilding replica failed, the next time we rebuild by reusing this replica, we can skip this snapshot
  4. This only helps to speed up the process at the snapshot completion milestones

Idea 2: (complex implementation - more benefit)
@shuo-wu has an idea of:

  1. Do the continuous checksum while transferring the data to the rebuilding replica on the rebuilding replica side
  2. Record the continuous checksum in a separate file
  3. If the rebuilding replica fails, next time we rebuild by reusing this replica, we can skip the transferred data region if it has the same checksum as the one recorded in the file above
  4. This helps to speed up the process at the data chunk completion milestones

Additional context:

We need to investigate how trim feature can affect these ideas

cc @shuo-wu @ejweber @innobead @derekbit

Should we issue a sync request for each rebuilt snapshot file or each data region?

Should we issue a sync request for each rebuilt snapshot file or each data region?

Is there any concern if we keep the current implementation of doing a sync gRPC for each snapshot file? I am proposing to modify the sparse tools (used by receiver launched by sync agent of the rebuilding replica) here https://github.com/longhorn/sparse-tools/blob/ed49dd3f93eb42590d1b57686572ee739106c099/sparse/rest/handlers.go#L245-L270 to calculate continuous checksum after each data/hole transfer completed