zrepl / zrepl

One-stop ZFS backup & replication solution

Home Page:https://zrepl.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: Snapshot bucket diagram of current snaps

Halfwalker opened this issue · comments

With more complex snapshot and pruning strategies, it can get a little complicated to ensure that you have the right configuration. It would be really nice to have an option in the zrepl status page to show the current list of snapshots for a job and how they fall into the buckets. Even better, show the current situation, and what it would look like after the next run of the snapshot process.

For example, given the snapshot/pruning config (from the examples)

# see /usr/share/doc/zrepl/examples
# or https://zrepl.github.io/configuration/overview.html

  # this job takes care of snapshot creation + pruning
  - name: snaphome
    type: snap
    filesystems: {
        "half2/home/alice<": true,
    }
    # create snapshots with prefix `zrepl_` every 15 minutes
    snapshotting:
      type: periodic
      interval: 15m
      timestamp_format: human
      prefix: zrepl_
    pruning:
      keep:
      - type: grid
        grid: 1x1h(keep=all) | 2x2h | 1x3h
        regex: "^zrepl_.*"
      # keep all snapshots that don't have the `zrepl_` prefix
      - type: regex
        negate: true
        regex: "^zrepl_.*"

It would eventually produce this list of zfs datasets, without pruning

half2/home/alice@zrepl_2023-06-01_00:00:00
half2/home/alice@zrepl_2023-06-01_00:15:00
half2/home/alice@zrepl_2023-06-01_00:30:00
half2/home/alice@zrepl_2023-06-01_00:45:00
half2/home/alice@zrepl_2023-06-01_01:00:00
half2/home/alice@zrepl_2023-06-01_01:15:00
half2/home/alice@zrepl_2023-06-01_01:30:00
half2/home/alice@zrepl_2023-06-01_01:45:00
half2/home/alice@zrepl_2023-06-01_02:00:00
half2/home/alice@zrepl_2023-06-01_02:15:00
half2/home/alice@zrepl_2023-06-01_02:30:00
half2/home/alice@zrepl_2023-06-01_02:45:00
half2/home/alice@zrepl_2023-06-01_03:00:00
half2/home/alice@zrepl_2023-06-01_03:15:00
half2/home/alice@zrepl_2023-06-01_03:30:00
half2/home/alice@zrepl_2023-06-01_03:45:00
half2/home/alice@zrepl_2023-06-01_04:00:00
half2/home/alice@zrepl_2023-06-01_04:15:00
half2/home/alice@zrepl_2023-06-01_04:30:00
half2/home/alice@zrepl_2023-06-01_04:45:00
half2/home/alice@zrepl_2023-06-01_05:00:00
half2/home/alice@zrepl_2023-06-01_05:15:00
half2/home/alice@zrepl_2023-06-01_05:30:00
half2/home/alice@zrepl_2023-06-01_05:45:00
half2/home/alice@zrepl_2023-06-01_06:00:00
half2/home/alice@zrepl_2023-06-01_06:15:00
half2/home/alice@zrepl_2023-06-01_06:30:00
half2/home/alice@zrepl_2023-06-01_06:45:00
half2/home/alice@zrepl_2023-06-01_07:00:00
half2/home/alice@zrepl_2023-06-01_07:15:00
half2/home/alice@zrepl_2023-06-01_07:30:00
half2/home/alice@zrepl_2023-06-01_07:45:00
half2/home/alice@zrepl_2023-06-01_08:00:00
half2/home/alice@zrepl_2023-06-01_08:15:00
half2/home/alice@zrepl_2023-06-01_08:30:00
half2/home/alice@zrepl_2023-06-01_08:45:00
half2/home/alice@zrepl_2023-06-01_09:00:00
half2/home/alice@zrepl_2023-06-01_09:15:00
half2/home/alice@zrepl_2023-06-01_09:30:00
half2/home/alice@zrepl_2023-06-01_09:45:00

These are the buckets those snapshots would fall into - starred ones are kept by the rules

half2/home/alice@zrepl_2023-06-01_00:00:00
half2/home/alice@zrepl_2023-06-01_00:15:00
half2/home/alice@zrepl_2023-06-01_00:30:00
half2/home/alice@zrepl_2023-06-01_00:45:00
half2/home/alice@zrepl_2023-06-01_01:00:00
half2/home/alice@zrepl_2023-06-01_01:15:00
half2/home/alice@zrepl_2023-06-01_01:30:00
half2/home/alice@zrepl_2023-06-01_01:45:00
half2/home/alice@zrepl_2023-06-01_02:00:00  * 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_02:15:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_02:30:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_02:45:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_03:00:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_03:15:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_03:30:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_03:45:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_04:00:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_04:15:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_04:30:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_04:45:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_05:00:00  * 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_05:15:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_05:30:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_05:45:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_06:00:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_06:15:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_06:30:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_06:45:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_07:00:00  * 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_07:15:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_07:30:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_07:45:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_08:00:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_08:15:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_08:30:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_08:45:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_09:00:00  * 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:15:00  * 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:30:00  * 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:45:00  * 1x1h(keep=all)

When the pruning policies are applied, it results in this, the actual current list of snapshots. This would be a great list to show all by itself.

half2/home/alice@zrepl_2023-06-01_02:00:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_05:00:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_07:00:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_09:00:00  | 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:15:00  | 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:30:00  | 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:45:00  | 1x1h(keep=all)

Even better would be to show what the list will look like after the next snap

half2/home/alice@zrepl_2023-06-01_02:00:00  | **To Be Pruned**
half2/home/alice@zrepl_2023-06-01_05:00:00  | 1x3h(keep=1)
half2/home/alice@zrepl_2023-06-01_07:00:00  | 2x2h(keep=1) bucket B
half2/home/alice@zrepl_2023-06-01_09:00:00  | 2x2h(keep=1) bucket A
half2/home/alice@zrepl_2023-06-01_09:15:00  | 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:30:00  | 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_09:45:00  | 1x1h(keep=all)
half2/home/alice@zrepl_2023-06-01_10:00:00  | 1x1h(keep=all) Next snap

There might be a good way to show the before/after in a single screen

1x3h(keep=1)          | half2/home/alice@zrepl_2023-06-01_02:00:00 | **To Be Pruned**
2x2h(keep=1) bucket B | half2/home/alice@zrepl_2023-06-01_05:00:00 | 1x3h(keep=1)
2x2h(keep=1) bucket A | half2/home/alice@zrepl_2023-06-01_07:00:00 | 2x2h(keep=1) bucket B
1x1h(keep=all)        | half2/home/alice@zrepl_2023-06-01_09:00:00 | 2x2h(keep=1) bucket A
1x1h(keep=all)        | half2/home/alice@zrepl_2023-06-01_09:15:00 | 1x1h(keep=all)
1x1h(keep=all)        | half2/home/alice@zrepl_2023-06-01_09:30:00 | 1x1h(keep=all)
1x1h(keep=all)        | half2/home/alice@zrepl_2023-06-01_09:45:00 | 1x1h(keep=all)
Next snap             | half2/home/alice@zrepl_2023-06-01_10:00:00 | 1x1h(keep=all)

While this is just a simple example, if there are multiple datasets, manual snapshots, other automated snapshots etc. the list of snaps and what will happen to them can get cumbersome. This kind of a diagram or list would make things a lot clearer, especially if it could reach out to remote backup systems to pull their snapshot lists.

Another thought ... Might be a way to show which pruning rule is the one keeping a snapshot. So if there was a snapshot created manually, it would look something like

 Current                                                                  After snapshot
---------------------------+--------------------------------------------+--------------------------
1x3h(keep=1)               | half2/home/alice@zrepl_2023-06-01_02:00:00 | **To Be Pruned**
2x2h(keep=1) bucket B      | half2/home/alice@zrepl_2023-06-01_05:00:00 | 1x3h(keep=1)
2x2h(keep=1) bucket A      | half2/home/alice@zrepl_2023-06-01_07:00:00 | 2x2h(keep=1) bucket B
1x1h(keep=all)             | half2/home/alice@zrepl_2023-06-01_09:00:00 | 2x2h(keep=1) bucket A
1x1h(keep=all)             | half2/home/alice@zrepl_2023-06-01_09:15:00 | 1x1h(keep=all)
regex (negate) "^zrepl_.*" | half2/home/alice@manual_pre_install        | regex (negate) "^zrepl_.*"
1x1h(keep=all)             | half2/home/alice@zrepl_2023-06-01_09:30:00 | 1x1h(keep=all)
1x1h(keep=all)             | half2/home/alice@zrepl_2023-06-01_09:45:00 | 1x1h(keep=all)
Next snap                  | half2/home/alice@zrepl_2023-06-01_10:00:00 | 1x1h(keep=all)

Same for a last_n rule - show that as last_n (20) to show the count

Color the snapshots to be pruned in red and the new snaps in green

Yes please!

It would be really nice to have an option in the zrepl status page to show the current list of snapshots for a job and how they fall into the buckets.

I like the terminal UI suggestions above.
I don't know how well they would scale with many filesystems etc.

Even better, show the current situation, and what it would look like after the next run of the snapshot process.

Yeah, dry-run pruning has been requested for a long time: #658


I'm looking into revising snapshot management in zrepl in general as the grid-based pruning policy has been difficult to understand for most users.

So, I don't know whether it's worth to invest tons of time into making grid easier to understand right now.
It wouldn't be too hard of a contribution though, so, if someone wants to flex their Go muscle, go for it.

For the UI with lots of filesystems etc, perhaps make the display foldable at the dataset level ?

I definitely like the grid approach - it's deterministic. I've been using zfs-autosnapshot for a long time, and while it works, the snap management misses sometimes. I use a threshold check to compare to the WRITTEN property of a dataset to determine whether to snap or not. That can wind up with a daily snap being a latest, skipping over the monthly,. The daily eventually ages out after 30 days, but there is no monthly to carry on for 12 months.

The grid approach using the creation-date property is much more stable. Hopefully you leave the grid as an option, just add other ways to manage snaps ?

I'm thinking to have separate trains for hourly, daily, weekly, monthly, etc. Distinguished by a ZFS property on the snapshot, or within the snapshot name, whatever.

A snapshot will always be on one of those trains, and never switch trains.

So, if it's 00:00, zrepl will create multiple snapshots of practically the same filesystem state, that will have independent lifespans. One snapshot will be on the hourly train, one on the daily train, one on the weekly train, ...

Along with the specification of which trains you have, you specify pruning in a very simple and predictable way: for hourly, retain the most recent 24 snapshots. For daily, the most recent 7. For weekly, ....

zrepl's holds and bookmarks ensure that this scheme won't cause replication conflicts / break incremental replications.

How does this sound to you? (Sorry for derailing this issue)

Hah - derail away

I like the idea of the trains with individual management of them. The only issue I see is the proliferation of snaps ... One might be taking high-resolution snaps at 15m intervals, hourly, daily, weekly, monthly. You wind up with a lot of snaps of the same state just with different names, which can make looking at the snap list cumbersome

With zfs-autosnapshot I have the cronjobs set up to take the snaps in monthly/weekly/daily/hourly order, using a threshold check (over 10m written, take snap). So if a monthly gets taken, then the weekly/daily will NOT be taken etc.

For example, a test box with that scheme looks like this

ryzen22/home/alice@auto-snap_monthly-2023-06-01-0401        5.57M  28.7G  -      -
ryzen22/home/alice@auto-snap_daily-2023-06-01-0403          12.3M  28.7G  -      -
ryzen22/home/alice@auto-snap_daily-2023-06-02-0403          594M   28.9G  -      -
ryzen22/home/alice@auto-snap_daily-2023-06-03-0403          711M   29.0G  -      -
ryzen22/home/alice@auto-snap_weekly-2023-06-04-0402         3.51M  29.0G  -      -
ryzen22/home/alice@auto-snap_daily-2023-06-05-0403          439M   29.1G  -      -
ryzen22/home/alice@auto-snap_hourly-2023-06-05-1704         273M   29.1G  -      -
ryzen22/home/alice@auto-snap_hourly-2023-06-05-1804         238M   29.1G  -      -
ryzen22/home/alice@auto-snap_hourly-2023-06-05-1904         222M   29.1G  -      -

Hourly are always taken regardless since they age out more quickly.

But I have had instances where the daily is the latest, so subsequent weekly/monthly don't get taken. Then the daily ages out after 30 days, breaking the retention intent.