Command to check file integrity (written in python and also packed as Windows .exe)
This command is intended to be used along backup scripts to detect changes and bitrot issues into files. IntegrityChecker
command generates (or verifies) files with checksum information for each file found in the folder. If creating checksums option --recursive
is specified, a checksum file (.ck++.nfo
) is created within each folder found recursively.
By default .nfo
files are YAML text files, but they can be set to JSON format (.ck++.json.nfo
) using the --json
option. All the operations on files are based on selected format.
usage: IntegrityChecker [-h] [-V] [-p] [-a] [-q] [-e] [-r] [-j] [-c | -f] [-o OUTPUT] [-i] [-t] [-s] [-v] [-d {1,2,3}] [path]
Create and check integrity checksums and hashes for files in folder
positional arguments:
path
optional arguments:
-h, --help show this help message and exit
-V, --version Print current version and exits
-p, --absolutepath Save absolute path
-a, --all Include all. Do NOT ignore files and folders stating with dot (.)
-q, --quickadd Quick add new files. Do not check or process already known files
-e, --empty Do not write hash files into empty folders (leaving them empty)
-r, --recursive Process subfolders recursively
-j, --json Check/Save hash file using JSON format instead of YAML
-c, --csv Generate a CSV file (; delimited) detailing all the files processed
-f, --fastcsv Generate a CSV file detailing all the files already processed (use hash files)
-o OUTPUT, --output OUTPUT
Output CSV contents to specified file. Ignored if a CSV argument is not present.
-i, --ignore Ignore already stored hashes (avoid changes detection)
-t, --test Test mode. Does NOT write updates to hash files
-s, --summary Print a summary of processed files when finish
-v, --verbose Verbose mode
-d {1,2,3}, --debuglevel {1,2,3}
Debug level: 1: Verbose (--verbose); 2: Detailed; 3: Debug
The option to export results to CSV can be run in two modes:
--csv
: CSV creation by reading files and calculating hashes--fastcsv
: CSV created just by reading already existing hash files (it doesn't check coherence with existing files or folders)
When a CSV file is created with the --fastcsv
option, format parameters are relevant (--json
).
The CSV file can be used for analytics such as:
- Looking for duplicated files
- Check offline changed files, files not accessed, ...
- Input for tools such as PowerBI
- Stream for running processes (CSV file is written incrementally), so you can know how many files have been processed in folder structures where hundreds of thousands or millions of files need to be processed. Samples:
- Windows:
date ; Get-Content ck++.csv | Measure-Object -Line
- Linux:
date ; wc -l ck++.csv
- Windows:
- Comparison between saved and actual hashes (
--fastcsv
and--test
can provide complimentary information without updating existing hash files)
When processing folder structures with a high number of files it is recommended to use CSV options so you can later process all the information with aditional tools.
Minor release versions should maintain .nfo
files compatibility, so this check may become relevant:
IntegrityChecker -V
IntegrityChecker -rco file.csv Base-Folder-to-Explore/
If you don't want to generate files in folders where there are not files to hash:
IntegrityChecker -reco file.csv Base-Folder-to-Explore/
IntegrityChecker -rfo file.csv Base-Folder-to-Explore/
It can be useful to automate changes detection between multiple runs. Not only to existing files, but for detecting other changes such as:
- New files and folders created
- Files moved
IntegrityChecker
Dependencies:
pyyaml
colorama
- File name for JSON files changed from
.ck++.nfoj
to.ck++.json.nfo
--debug
replaced by debug level--debuglevel
- Description (out of verbose mode), in color, for New, Changed and unchanged (Ok) files
--summay
partly implemented to start offering results--quickadd
(not refined) to add new files into checksums ignoring the already existing ones- Strings and constants moved to
constants.py
- Version output (
--version
) - Output CSV to specified file (
--output
) - Option to avoid generating
.nfo
files on folders where nothing is being hashed (--empty
)
- Export to CSV option (
--csv
) - Generate a CSV without processing files, just by using existing
.nfo
files (--fastcsv
) - UTF-8 encoding by default
- Detected bugs (file management) solved
- Icon for the Windows .EXE executable
- Config file for using Auto-Py-to-Exe for .exe generation
- Recursive mode (
--recursive
) - Debug mode (
--debug
) - Store absolute path if required (
--absolutepath
) - Include/ignore files starting with a dot (
--all
) - Save results into
.nfo
files using JSON format (--json
) - Basic functionality to work and process parameters
List of features being considered to be implemented:
- [WIP] 'Continue'... to update only those files that are not already hashed
- Pre-scan: Generate initial statistics of files to process in order to offer an estimated progress bar or %
- Upgrade summary:
- Folders
- Processed folders
- Empty folders
- Processed files (processed hashes)
- Omitted files (currently only folders are considered)
- Folders
- Offer multiple hasing algorithms
- Set CSV field delimitator
- Comparisson commands (i.e. life comparisson or only-CSV-based) --> New command (Testing WIP using Pandas)
- Mirror mode
- CSV Analysis
- Duplicated files
- Changed files
- Summary from CSV
- Compare 2 CSVs to find different hashes on entries with same metadata
- Ignore relative paths if everything is equal
- Check only one file (against local .nfo or general .csv) so it can be scripted
List of known bugs or limitations currently detected on the applications:
- Nothing here still... testing needed