NaPs / Marty

An efficient backup tool inspired by Git, saving your bandwidth and providing global deduplication at file level.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement a garbage collector

NaPs opened this issue · comments

Implement a garbage collector in order to delete objects in pool that are no longer referenced by a label or another object. Also implement the "gc" command which will enable user to use this garbage collector through the CLI.

Basic algorithm idea

  • Iterate over each label and recursively browse referral objects
  • Transform each browsed object name into a memory efficient Python object such as integer or string (but not a string of the hexadecimal representation of the sha1!)
  • Add this name into a set if it is not already in, else skip the branch processing

At the end, you will have a set of all referenced object. Iterate over the object list in storage and remove objects that are not in the set.

A lock may be required to avoid removal of object created by a running backup.