edsu / dedoop

recursively deduplicate a directory and write its contents to a new directory while remembering the old paths

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cloud storage (v2)

edsu opened this issue · comments

At MITH we want to update dedoop significantly to write to cloud storage (Amazon S3, Google Cloud Storage, Microsoft Azure, etc) using Apache libcloud. Instead of writing sequentially named files it will use the sha256 checksum as a object name, and will store file media type, original name and last modification time as object metadata. dedoop will retain the ability to write to the local filesystem, but in this case it will retain the file extension, and the file metadata will continue to be written to a JSON file.

If you want to read more of the details you can find them here:

https://app.gitbook.com/@lakeland-digital-archive/s/design-documents/ingest-utility

Please let us know here if you have any thoughts about any of this!