kornosk / GDPR-similarity-comparison

This repo aims to extract pieces of GDPR-like content and form well-structured data for easy processing. We measure the similarity between GDPR-like from different countries.

Home Page:https://arxiv.org/abs/2105.10117

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GDPR Similarity Comparison

This repo is a part of the report - Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws ๐Ÿ”ฅ

  • We extract information from GDPR-like documents from different countries written in natuaral language and construct well-strucured data.

  • The structured data are 4 columns including chapter, section, article and recital. This could benefit any future work that would like to explore GDPR-like using computational methods. ๐Ÿš€

  • This project is inspired by COSC-824 Data Protection by Design, Department of Computer Science at Georgetown University.

Data

We convert from PDF to Docx to CSV with well-structured style. Now, our data include GDPR-like documents from:

  • European ๐Ÿ‡ช๐Ÿ‡บ
  • Brazil ๐Ÿ‡ง๐Ÿ‡ท
  • Indian ๐Ÿ‡ฎ๐Ÿ‡ณ
  • What next? ๐Ÿ˜‰

Simply load the data into a dataframe in Python as following code.

import pandas as pd

file_path = "data/LGPD-ES-Brazil-converted.csv"
df = pd.read_csv(file_path) # columns: ["chapter", "section", "article", "recital"]

Materials

Project Member

  • Kornraphop Kawintiranon - Github
  • Yaguang Liu - Github
  • Prof. Benjamin E. Ujcich (Instructor) - Personal

Citation

If you feel our paper and resources are useful and encouraging, please consider citing our work! ๐Ÿ™

@article{kawintiranon2021automatic,
    title={Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws},
    author={Kawintiranon, Kornraphop and Liu, Yaguang},
    journal={arXiv preprint arXiv:2105.10117},
    year={2021},
    url={https://arxiv.org/abs/2105.10117}
}

References

About

This repo aims to extract pieces of GDPR-like content and form well-structured data for easy processing. We measure the similarity between GDPR-like from different countries.

https://arxiv.org/abs/2105.10117

License:MIT License


Languages

Language:Python 100.0%