Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times

About

Pytorch implementation of the categorical inference poisoning (CIP) framework proposed in the paper.

Training of Victim Model

"Victim model" is equivalent to "to-be-protected model" or "original model" that has been trained using the original training data for the targeted functionality (i.e., classification tasks in this work).

cd model_train
python main.py

The preparation and the structure of dataset:

dataset
|----MNIST
|  |----train
|  |  |----1.png
|  |  |----2.png
|  |  |----...
|  |----test
|----MNIST_half
|  |----train
|  |  |----1.png
|  |  |----2.png
|  |  |----...
|  |----test
|----MNIST_surrogate
|  |----train
|  |  |----1.png
|  |  |----2.png
|  |  |----...
|  |----test
|----Cifar100
|...
|----FOOD101

The MNIST_half is the half of MNIST_all, and it's used to test the IDA and DQA (The number of each category in MNIST_half is half that of MNIST). Note that it's necessary to set the type of attack before testing:

parser.add_argument('--attack',type=str,default='Datafree', choices=['Knockoff','Datafree','IDA'], help='choosing the attack forms')

and the final pt file will be saved in

trained
|----Datafree
|  |----without_fine-tuning.pt
|----Knockoff
|----IDA

Boosting OOD detector with OE:

cd ood_detection
python detection_train.py

Before the fine-tuning, the dataset of tinyimage and the testing OOD datasets should be prepared:

dataset
|----Open-set test
|  |----DTD
|  |  |----1.png
|  |  |...
|  |----Place365
|----tiny_images.bin

The path setting of tiny_images is in the 13th and 28th lines in the file

CIP/Utils/tinyimages_80mn_loader.py

Note that the OOD datasets like DTD are used to test the performance of OOD detector and you could use any another type of OOD datasets even the noises. The final pt file will be saved in

trained
|----Datafree
|  |----after_fine-tuning.pt
|----Knockoff
|----IDA

Testing the performance of OOD detector and recording the threshold E1 and E2:

python Detection_test.py

A excel fie will be generated automatically which is used to record the energy threshold values.

	dataset	FPR_energy	open-set_energy
0	Mnist	E1	E2

Testing the defense performance against KnockffNets:

cd Knockoff
python main.py

Data preparation of KnockffNets:

dataset
|----Imagenet_100K
|  |----1.png
|  |----2.png
|  |...
|  |----100000.png

To test the KnockoffNets, we need to collect 100K ImageNets images.

And note in each attacks, we should determine whether poisoning and choose the poison methods.

parser.add_argument('--poison', type=bool, default=True, help='whether poisoning')

parser.add_argument('--method', type=str, default='CIP',choices=['CIP','DP','DAWN'])

Testing the defense performance against DFME:

cd DF
python main.py

Before testing we should set the path of pt file:

parser.add_argument('--pt_file',type=str,default='../Victim_Model_Train/Trained/Datafree/Mnist_resnet18_epoch_64_accuracy_99.61%.pt',help='setting the Victim model')

Note when testing the DP and DAWN, we should use the model without the fine-tuning of OE. In contrast, when testing the CIP, we should use the model after the fine-tuning of OE.

IDA

cd IDA
python main.py

Data preparation of IDA:

dataset
|----Cifar10_half
|  |----1.png
|  |----2.png
|  |...
|  |----100000.png
|----Cifar100_half
|----FOOD101_half

The Cifar10_half in the folder of IDA is the images complementary to that in the folder of Victim_Model_Train.

DQA

cd IDA
python main.py

Because the data sets used by DQA and IDA are exactly the same, there is no need to set the data set separately for DQA. But the noise images should be prepared before testing:

DQA
|----Noise
|  |----N1.png
|  |----N2.png
|  |----N3.png
|  |...
|  |---N60000.png

Main results

The experiment results of CIP, DAWN, DP against Knockoff, DFME, IDA, DQA:

Citation Information

H. Zhang, G. Hua*, X. Wang, H. Jiang, and W. Yang “Categorical inference poisoning: Verifiable defense against black-box DNN model stealing without constraining surrogate data and query times,” IEEE Transactions on Information Forensics and Security, 2023. Link

Hatins / CIP_master

Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times

About

Training of Victim Model

Boosting OOD detector with OE:

Testing the performance of OOD detector and recording the threshold E1 and E2:

Testing the defense performance against KnockffNets:

Testing the defense performance against DFME:

IDA

DQA

Main results

Citation Information

About

Languages