Data redaction tool for text files
This tool will help you ensure that sensitive information is not unintentionally sent out of your organization.
There are times that troubleshooting issues may require your vendor to analyse your log files. Ideally sensitive information such as IP addresses, hostnames, email addresses and even personal information might need to be redacted / masked.
Most of the time, redacting such information is reliant on the engineer eyeballing / searching and replacing sensitive information. Needless to say this is prone to human error and can sometimes take up a lot of an engineer's time.
Redactor helps by maintaining a repository of patterns that can be used over and over again to redact files in seconds. Tested timings on redacting a 4GB log file takes less than a minute
The tool is configured so that developers may expand on this by using redactor as a module. Or users may just opt to install the tool and run the tool in command line.
$ git clone https://github.com/ben-labs/redact-py.git
$ cd redact-py
$ pip install .
$ redactor -h
Sample Result:
You can create your own rule files and feed it to redactor with the -r flag. Sample of what a redaction rule file will look like:
Attribute | Description |
---|---|
pattern | Regex pattern of string to find |
mask | Replace found patterns with the mask |
Description | Non-Mandatory description |
Flag | Description |
---|---|
-h, --help | Displays help message |
-r RULEFILE, --rulefile RULEFILE | Sets a custom rulefile |
-o OUTPATH, --outpath OUTPATH | Specify a directory to dump redacted files. Creates one if directory is not there. |