Keep in mind that this is my first rust project.
If it sucks you know why.
- .da deduplicating archive
- .di deduplicating index
-
Compress/Decompress
-
Compression presets (per chunk)
- None
- Lizard/lzturbo for speed.
- Zstd for speed/efficiency.
- Test multiple methods, keep the best.
-
Fast CDC chunking settings (per archive)
Only works when creating an archive for the first time. When updating an archive read Fast CDC settings in index.
- Presets
- Bigger chunks, for speed.
- Smaller chunks for tradeoff
- Very Small chunks for better dedup but worse compression
- Manual settings to select optimal based on content type
- Presets
-
Input/Output Files/Dirs
-
Decompression path (defaults to working dir)
-
Archive name
-
Long help containing info on compression methods and such.
-
Verbosity
-
Version
Create an archive
darc MyArc a -m3 -c2 -i File.png -i /home/user/somedir/ -i Song.mp3:/Music/
Creates archive MyArc
select compression method 3
chunking method 2
add a file
add a directory
add a file with a custom archive path
The archive tree should look like
File.png
somedir/[...]
Music/Song.mp3
Extract File.png from the archive.
darc MyArc d -i File.png -o ExtractDir
darc MyArc u -i Code.c
Update the archive by adding a new file
darc MyArc u -i Newconf.conf:Somefile.conf -r Somefile.conf
Update the arhive
darc MyArc l
List contents
darc MyArc l somedir/*.iso
List contents using glob
A arc_filelist dict saved in the index
file
[hashes]
A fcdc_chunklist dict used during compression.
hash
filepath
chunk_start
chunk_length
A arc_chunklist dict saved in the index
hash
compression_method
chunk_start
chunk_length
If these object do not exist, create them. if they do, read them.
Iterate over the files using fast CDC to fill in arc_filelist and fcdc_chunklist if chunk exists in fcdc_chunklist do not add it.
Once done and no pending removal, pack arc_filelist and write to index.
Iterate over fcdc_chunklist, read chunks from source files, filter through compression function, write an arc_chunklist entry and write to arc.
Once done and no pending removal, pack arc_filelist and write to index.
Create temporary_array Create arc_chunklist_new
Parse arc_filelist for filenames given, add chunks to temporary_array Parse arc_filelist ommiting files that are to be removed and remove from temporary_array the chunks that are part of files. Walk through arc_chunklist and add chunks that are not part of temporary_array to arc_chunklist_new
Pack and write data.
Create extract dict
chunk_start
compression_method
chunk_length
[files]
Parse arc_filelist and arc_chunklis For each filenames, for each hash, add entry to extract dict if the same chunk_start already exists, add filename to files array.
Open file descriptor for every filename Parse extract dict in order, Seek to from chunk_start to chunk_length, decompress and write to the relevant files so as to only decrompress once. Maybe if files only contain one entry then write directly to the fd otherwise write to a buffer then to the appropriate fds
Things that might be useful
https://docs.rs/crate/fastcdc/latest
https://docs.rs/lzma-rs/0.2.0/lzma_rs/ https://docs.rs/zstd/0.12.1+zstd.1.5.2/zstd/ https://docs.rs/crate/brotli/3.3.4 https://docs.rs/crate/lz4_flex/0.9.5