tf-maam / dupsrm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dupsrm

License: GPL v3 Rust GitHub Release

Command line tool to remove duplicated files. It recurses a reference and a root directory, finds file duplicates from the reference directory tree in the root directory tree and removes them.

dupsrm: duplicates removal

Usage

Remove duplicated files in the reference directory that are found in the root directory tree

Usage: dupsrm [OPTIONS] <REFERENCE_DIR> <ROOT_DIR>

Arguments:
  <REFERENCE_DIR>  Reference directory path
  <ROOT_DIR>       Root directory path

Options:
  -n, --dry-run
          Perform a dry-run without removing any file
  -r, --regex <REGEX>
          Regular expression filtering files in reference directories
  -a, --hash-algorithm <HASH_ALGORITHM>
          Hash algorithm [default: SHA2-256] [possible values: SHA2-256, SHA3-256, SHA1, MD5, WHIRLPOOL, RIPEMD-160, BLAKE-256]
  -h, --help
          Print help
  -V, --version
          Print version

Usecases:

  • Removing outdated backups
  • Cleaning Downloads folder from copied and possibly renamed files
  • Save disk space

Installation

cargo build --release
cargo install --path .

Profiling

rm default_* 
rm dupsrm.profdata
cargo clean
# profile execution
RUSTFLAGS="-C instrument-coverage" cargo build
target/debug/dupsrm test/ .
# or profile tests
RUSTFLAGS="-C instrument-coverage" cargo test --tests

llvm-profdata merge -sparse default_*.profraw -o dupsrm.profdata
llvm-cov report --use-color --ignore-filename-regex='/.cargo/registry' --instr-profile=dupsrm.profdata --object target/debug/dupsrm
llvm-cov show --use-color --ignore-filename-regex='/.cargo/registry' --instr-profile=dupsrm.profdata --object target/debug/dupsrm

TODOs

  • Recursively iterate root and reference directories
  • Calculate the hash of each file and store them in a list aside from the path
  • Create a list of duplicates in the reference directory
  • Add command line interface to define reference and root paths
    See the Rust CLI book for further details. Use clap for command line argument parsing
  • Add the method to remove files
  • Add the command line flags -n, --dry-run to don't remove files as in git rm
  • Modularize source code into different files
  • Add additional unit tests with an example file structure
  • Create a docker container for running build tests
  • Create Github and Gitlab CI
  • Modularize the hash function to allow the usage of other hash algorithms
  • Benchmark implementation using cargo-bench
  • Parallelize iterators and hashing of files in multiple threads
  • Write documentation with usage examples
  • Extend logger output
  • Use a hashmap to find duplicated hashes decreasing the computational complexity
  • Add a filter for file types or regex support
  • Use PathBuf instead of String for paths
  • Wrap hash type with &str or fixed size type
  • Add a flag to not recurse the reference directory or set a maximum depth
  • Provide usage examples with regular expression
  • Add an option to create symlinks or hard links to original files, replacing the removed files in the reference directory

About

License:GNU General Public License v3.0


Languages

Language:Rust 99.5%Language:Dockerfile 0.5%