elemental-lf / benji

Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices

Home Page:https://benji-backup.me

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does Benji deduplication may help on WAN bandwidth reduction?

luhaijiao opened this issue · comments

This thread is just to clarify if Benji may help on WAN bandwidth reduction with deduplication feature in a DR scenario.

In our case, we have 20 remote sites as Ceph RBD sources (aka. R1, R2, R3...), and one more site as centralized backup destination, aka site C1. We would like to deploy Benji in site C1 to backup all remote sites thru WAN connections.

Though based on our limited knowledge on Benji, it is able to deduplcate data at local site only where it stays (in our case, it's C1), would like to check if it's possible to have Benji to deduplicate data at remote sites before backing up them to site C1? As such kind of 'inline deduplication' would significantly reduce the WAN bandwidth requirement and save a big chunk of money
for sure.

Thanks!

  • If you would deploy Benji instances at each remote site but use one central database located at your central site Benji would only write unique blocks to the storage which I assume is at the central site. This would safe you bandwidth as the blocks are deduplicated and compressed (if enabled) before going over the WAN connection.
  • If you only have one Benji instance at the central location then the deduplication would happen at the central site. Ceph rbd diff feature which is normally used by Benji would still help to only look at changed blocks.

My recommendation would be to deploy Benji instances at each remote site if you have the compute resources there.

Thanks for prompt reply !
Option #1 looks great, but I assume we also need deploy one Benji at the central site, thus while any remote site is completely down, we can still leverage the central site Benji to recover from the backup. Is it correct ?

BTW, any recommended spec for the Benji server in term of CPU/Memory and disk, if we enable both deduplicaiton and compression (level 3) ?

Yes, just deploy the same configuration at the central site. I'd suggest to roll out all Benji instances via some kind of automation so that all configurations stay in sync.
Apart from some restore scenarios you shouldn't need much disk space at all. CPU and memory largely depend on your environment and you'll have to test that our yourself. But Benji is mainly I/O bound.

Thanks a lot for your clarification, close this issue now.