data61 / clkhash

CLK hash: hash pii for entity matching

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLK Hash

Clkhash Logo

codecov Documentation Status Unit Testing Typechecking Downloads

clkhash is a Python implementation of cryptographic linkage key hashing as described by Rainer Schnell, Tobias Bachteler, and Jörg Reiher in A Novel Error-Tolerant Anonymous Linking Code.

Installation

Install clkhash with all dependencies using pip:

pip install clkhash

Documentation

https://clkhash.readthedocs.io

Python API

To hash a CSV file of entities using the default schema:

from clkhash import clk, randomnames
fake_pii_schema = randomnames.NameList.SCHEMA
clks = clk.generate_clk_from_csv(open('fake-pii-out.csv','r'), 'secret', fake_pii_schema)

Command Line Interface

See Anonlink Client for a command line interface to clkhash.

Citing

Clkhash, and the wider Anonlink project is designed, developed and supported by CSIRO's Data61. If you use any part of this library in your research, please cite it using the following BibTex entry::

@misc{Anonlink,
  author = {CSIRO's Data61},
  title = {Anonlink Private Record Linkage System},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/data61/clkhash}},
}

About

CLK hash: hash pii for entity matching

License:Apache License 2.0


Languages

Language:Python 100.0%