Implement a garbage collector
NaPs opened this issue · comments
Antoine Millet commented
Implement a garbage collector in order to delete objects in pool that are no longer referenced by a label or another object. Also implement the "gc" command which will enable user to use this garbage collector through the CLI.
Basic algorithm idea
- Iterate over each label and recursively browse referral objects
- Transform each browsed object name into a memory efficient Python object such as integer or string (but not a string of the hexadecimal representation of the sha1!)
- Add this name into a set if it is not already in, else skip the branch processing
At the end, you will have a set of all referenced object. Iterate over the object list in storage and remove objects that are not in the set.
A lock may be required to avoid removal of object created by a running backup.