shubhankar5 / scrub-system-for-de-identification

A scrub system for de-identification and cleaning of data to maintain its privacy from the world.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scrub-system-for-de-identification

A scrub system for de-identification and cleaning of data to maintain the privacy of data when sharing it with other organizations. Here, we are focusing on the medical dataset as it is quite vulnerable to data leakage. But this algorithm can be applied to any dataset to ensure its privacy.

How to use?

python main.py -f Input_files/records.csv -o output_file_name

-f, --input-file-path: Input file path
-o, --output-file-name: Output file name

Demo

Output Image

The above image is an illustration of the output.

Note: 3 inputs are taken from the user as highlighted in the above image. Based on these inputs, the decision is formed and the output is shown.

Check out the complete demo with explanation here

Dataset used

A medical open-source dataset named "Electronic Health Record (EHR) Incentive Program Payments for Eligible Providers" taken from here

About

A scrub system for de-identification and cleaning of data to maintain its privacy from the world.


Languages

Language:Python 100.0%