maksimt / empirical_privacy

A empirical approach to measure privacy preserved in models of sensitive data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

empirical_privacy

This repository contains the code necessary to reproduce results in our paper Empirical Methods for Estimating Privacy.

  • We use docker to create a reproducible execution environment.
  • We use luigi to express our privacy estimation algorithms as a DAG of dependencies. We also use luigi to define our experiments. The advantages of expressing experiments and the privacy estimation algorithms as a DAG is that the luigi scheduler can avoid re-computing previouslly computed results, and it can compute independent nodes in parallel.
  • Each experiment described in our paper has a corresponding jupyter notebook. Each experiment can be entirely reproduced by selecting "Kernel > Restart & Run All"

We hope that our privacy estimation algorithms, and experiment framework can be used to study 'privacy problem settings' that we haven't thought of. :)

Requirements to run:

  1. docker.

Basic how to get started

  1. Clone & Build+Run Docker (the build takes 5-10 min):
git clone https://github.com/maksimt/empirical_privacy
cd empirical_privacy
docker-compose up
  1. Navigate to the jupyter-notebook running inside the docker container.
  2. Get the jupyter token from the console output.
  3. Navigate to 127.0.0.1:8888 and enter the token you just got.
  4. Open Notebooks/Experiment 1 -- Bootstrap Validation.ipynb.ipynb and run the cells in order from top to bottom.

Using the luigi-based sampling framework for your own empirical privacy experiments

luigi is a python-based dependency specification framework. It provides a central scheduler which makes it easy to parallelize the execution of a computation graph while ensuring that work isn't duplicated and hardware is fully utilized.

We provide a framework that will orchestrate the experiments needed to measure empirical_privacy. The goal is to minimize the amount of code that needs to be written for a new problem setting, as well as take care of the implementation and testing for the key algorithms.

  1. The main task is to implement a GenSample subclass that overrides the gen_sample(sample_number) method. See the one-bit-sum example to start out, and then see row_distributed_svd. Problem-specific parameters can be passed in the dataset_settings parameter.
  2. Once that's done you can use the build_convergence_curve_helper to build a end-to-end pipeline with sensible defaults, or you can customize it by overriding the classes in the framework.
  3. To compute the targets using luigi they must be passed to luigi.build (see Notebooks for examples). These will typically need to communicate to a luigi scheduler server, which you can run by opening a terminal from Juptyer and running luigid. The scheduler will show you the progress of your computation on localhost:8082.

You may also be interested in my notes on integrating with PyCharm.

About

A empirical approach to measure privacy preserved in models of sensitive data

License:MIT License


Languages

Language:Jupyter Notebook 96.2%Language:Python 3.7%Language:Dockerfile 0.0%