ArtLabss / open-data-anonymizer

Python Data Anonymization & Masking Library For Data Science Tasks

Home Page:https://www.artlabs.tech

Repository from Github https://github.comArtLabss/open-data-anonymizerRepository from Github https://github.comArtLabss/open-data-anonymizer

[FEATURE] PDF redaction of sensitive information

rscmendes opened this issue · comments

Is your feature request related to a problem? Please describe.
The current PDF anonymization mechanism only works for printed documents. Adding a black-box over text does not remove the text from the PDF file itself, such that sharing the digital document will still contain the sensitive information.

Describe the solution you'd like
2 things:

  1. Please add a warning in the README.md in the PDF Black-box feature, stating that it only protects against printed copies of the document, not digital copies.
  2. Please consider adding a PDF re-writing solution, where instead of adding a black-box over sensitive information, a string such as "[REDACTED]" would directly replace the text.

Let me know if you are open to contributions, I would gladly help implementing this feature.

Describe alternatives you've considered

Additional context

PS: there is a bug on your feature request template, the default title is "[FEAUTRE]"