This repository contains the code implementation for the paper titled "Automated detection of unstructured context-dependent sensitive information using deep learning." The project focuses on developing an automated system for detecting sensitive information in unstructured text data, leveraging deep learning techniques.
The goal of this project is to create a deep learning model capable of identifying and classifying sensitive information within unstructured text. The model utilizes state-of-the-art deep learning algorithms to analyze the contextual dependencies of the data and accurately identify sensitive information. The code implementation provided in this repository serves as a reference for replicating the experiments and methodology presented in the associated paper.
-
Clone the repository to your local machine.
git clone https://github.com/your-username/automated-sensitive-info-detection.git
-
Create and activate a virtual environment (optional but recommended).
python -m venv env source env/bin/activate # for Linux/Mac env\Scripts\activate # for Windows
-
Install the required libraries.
pip install -r requirements.txt
The project structure is organized as follows:
automated-sensitive-info-detection/
├── data/
│ └── dataset.csv
├── src/
│ ├── rnn_model.py
│ ├── cnn_model.py
│ ├── statistical_model.py
│ ├── tfidf_model.py
│ └── preprocessing.py
- data/: Contains the dataset used for training and evaluating the models.
- src/: Contains the source code for different models used to detect sensitive information detection.
The project has the following dependencies:
- gensim
- nltk
- Tensorflow
- textblob
- en_core_web_sm
Please refer to the requirements.txt file for a complete list of dependencies with their versions.
This project is licensed under the GNU License. Feel free to modify and use this code implementation according to your needs. For more details about the research paper, please refer to the associated publication.