PrivateVision

This Pytorch codebase implements efficient training of differentially private (DP) vision neural networks (CNN, including convolutional Vision Transformers), using mixed ghost per-sample gradient clipping.

❓ What is this?

There are a few DP libraries that change the regular non-private training of neural networks to a privacy-preserving one. Examples include Opacus, FastGradClip, private-transformers, and tensorflow-privacy.

However, they are not suitable for DP training of large CNNs, because they are either not generalizable or computationally inefficient. E.g. causing >20 times memory burden or >5 times slowdown than the regular training.

This codebase implements a new technique --the mixed ghost clipping-- for the convolutional layers, that substantially reduces the space and time complexity of DP deep learning.

🔥 Highlights

We implement a mixed ghost clipping technique for the Conv1d/Conv2d/Conv3d layers, that trains DP CNNs almost as light as (with 0.1%-10% memory overhead) the regular training. This allows us to train 18 times larger batch size on VGG19 and CIFAR10 than Opacus, as well as to train efficiently on ImageNet (224X224) or larger images, which easily cause out of memory error with private-transformers.
Larger batch size can improve the throughput of mixed ghost clipping to be 3 times faster than existing DP training methods. On all models we tested, the slowdown is at most 2 times to the regular training.
We support general optimizers and clipping functions. Loading vision models from codebases such as timm and torchvision, our method can privately train VGG, ResNet, Wide ResNet, ResNeXt, etc. with a few additional lines of code.
We demonstrate DP training of convolutional Vision Transformers (up to 300 million parameters, again 10% memory overhead and less than 200% slowdonw than non-private training). We improve from previous SOTA 67.4% accuracy to 83.0% accuracy at eps=1 on CIFAR100, and to 96.7% accuracy at eps=1 on CIFAR10.

🍻 Examples

To DP training models on CIFAR10 and CIFAR100, one can run

python -m cifar_DP --lr 0.001 --epochs 3 --model beit_large_patch16_224

Arguments:

--lr: learning rate, default is 0.001
--epochs: number of epochs, default is 1
--model: name of models in timm, default is resnet18; see supported models below
--cifar_data: dataset to train, CIFAR10 (default) or CIFAR100
--eps: privacy budget, default is 2
--grad_norm: per-sample gradient clipping norm, default is 0.1
--mode: which DP clipping algorithm to use, one of ghost_mixed(default; the mixed ghost clipping), ghost (the ghost clipping), non-ghost (the Opacus approach), non-private (standard non-DP training)
--bs: logical batch size that determines the convergence and accuracy, but not the memory nor speed; default is 1000
--mini_bs: virtual or physical batch size for the gradient accumulation, which determines the memory and speed of training; default is 50
--pretrained: whether to use pretrained model from timm, default is True

🚀 Getting Started

Privately training vision models is simple:

Create the model and any optimizer
Attach this optimizer to our PrivacyEngine (this essentially adds Pytorch hooks for per-sample clipping)
Compute per-example losses (setting reduction=='none') for a mini-batch of data
Pass the loss to optimizer.step or optimizer.virtual_step without calling the backward function (this is implicitly called inside PrivacyEngine)

Below is a quick example of using our codebase for training CNN models with mixed ghost clipping:

import torchvision, torch, opacus
from private_vision import PrivacyEngine

model = torchvision.models.resnet18()

# replace BatchNorm by GroupNorm or LayerNorm
model=opacus.validators.ModuleValidator.fix(model)

optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-4)
privacy_engine = PrivacyEngine(
    model,
    batch_size=256,
    sample_size=50000,
    epochs=3,
    max_grad_norm=0.1,
    target_epsilon=3,
    ghost_clipping=True,
    mixed=True,
    )
privacy_engine.attach(optimizer)

# Same training procedure, e.g. data loading, forward pass, logits...
loss = F.cross_entropy(model(batch), labels, reduction="none")
# do not use loss.backward()
optimizer.step(loss=loss)

In the above PrivacyEngine, which shares the design of Opacus v0.15 (see how to use keywords like max_grad_norm and target_epsilon in https://github.com/pytorch/opacus/blob/v0.15.0/opacus/privacy_engine.py),

setting ghost_clipping=True, mixed=True implements the best method, mixed ghost clipping;
setting ghost_clipping=True, mixed=False implements the ghost clipping, which are very memory-costly for large images (e.g. failing to fit a single 400X400 image into ResNet18 with a 16GB GPU);
setting ghost_clipping=False implements a similar approach to Opacus, which needs to instantiate the per-sample gradients that are very memory-costly.

A special use of our privacy engine is to use the gradient accumulation. This is achieved with virtual step function.

import torchvision, torch
from private_vision import PrivacyEngine

gradient_accumulation_steps = 10  

# Batch size/physical batch size. Take an update once this many iterations

model = torchvision.models.resnet18()
model=opacus.validators.ModuleValidator.fix(model)
optimizer = torch.optim.Adam(model.parameters())
privacy_engine = PrivacyEngine(...)
privacy_engine.attach(optimizer)

for i, batch in enumerate(dataloader):
    loss = F.cross_entropy(model(batch), labels, reduction="none")
    if i % gradient_accumulation_steps == 0:
        optimizer.step(loss=loss)
        optimizer.zero_grad()
    else:
        optimizer.virtual_step(loss=loss)

📋 Currently Supported Modules

nn.Linear (2D Ian Goodfellow)
nn.Linear (3D Xuechen et al.)
nn.LayerNorm (Opacus)
nn.GroupNorm (Opacus)
nn.Embedding (Xuechen et al.)
nn.Conv1d (this work)
nn.Conv2d (this work)
nn.Conv3d (this work)
nn.Linear (4D; this work)

For unsupported modules, an error message will print out the module names and you need to mannually freeze them to not require gradient.

As a consequence, we can privately train most of the models from timm (this list is non-exclusive):

beit_base_patch16_224, beit_large_patch16_224, cait_s24_224, cait_xxs24_224, convit_base, convit_small, convit_tiny, convnext_base, convnext_large, crossvit_9_240, crossvit_15_240, crossvit_18_240, crossvit_base_240, crossvit_small_240, crossvit_tiny_240, deit3_base_patch16_224, deit_small_patch16_224, deit_tiny_patch16_224,

dla34, dla102, dla169, ecaresnet50d, ecaresnet269d, gluon_resnet18_v1b, gluon_resnet50_v1b, gluon_resnet152_v1b, gluon_resnet152_v1d, gluon_resnet152_v1s, hrnet_w18, hrnet_w48, ig_resnext101_32x8d, inception_v3, jx_nest_base, legacy_senet154, legacy_seresnet18, legacy_seresnet152, mixer_b16_224, mixer_l16_224, pit_b_224, pvt_v2_b1,

resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, res2net50_14w_8s, res2next50, resnest50d, seresnet50, seresnext50_32x4d, ssl_resnet50, ssl_resnext50_32x4d, swsl_resnet50, swsl_resnext50_32x4d, tv_resnet152, tv_resnext50_32x4d, twins_pcpvt_base, twins_pcpvt_large, twins_svt_base, twins_svt_large   

vgg11, vgg11_bn, vgg13, vgg16, vgg19, visformer_small, vit_base_patch16_224, vit_base_patch32_224, vit_large_patch16_224, vit_small_patch16_224, vit_tiny_patch16_224, volo_d1_224, wide_resnet50_2, wide_resnet101_2, xception, xcit_large_24_p16_224, xcit_medium_24_p16_224, xcit_small_24_p16_224, xcit_tiny_24_p16_224

We also support models in torchvision and other vision libraries, e.g. densenet121, densnet161, densenet201.

Citation

Please cite our paper if you use PrivateVision in your papers, as follows:

@article{bu2022scalable,
  title={Scalable and efficient training of large convolutional neural networks with differential privacy},
  author={Bu, Zhiqi and Mao, Jialin and Xu, Shiyun},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={38305--38318},
  year={2022}
}

Acknowledgement

This code is largely based on https://github.com/lxuechen/private-transformers (v0.1.0) and https://github.com/pytorch/opacus (v0.15).

woodyx218 / private_vision