DIAGNOSIS

This repository is the source code for "DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models" (ICLR 2024).

Environment

See requirements.txt

Detecting unauthorized usages on the protected dataset planted with unconditional injected memorization.

Planting unconditional injected memorization into model:

python coating.py --p 1.0 --target_type wanet --unconditional --wanet_s 2 --remove_eval

Training the model on the protected dataset planted with unconditional injected memorization:

export MODEL_NAME="CompVis/stable-diffusion-v1-4" \
export TRAIN_DATA_DIR="./pokemon-blip-captions_p1.0_wanet_unconditional_s2.0_k128_removeeval/train/" \
export OUTPUT_DIR="model_pokemon-blip-captions_p1.0_wanet_unconditional_s2.0_k128" \

CUDA_VISIBLE_DEVICES=0 accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DATA_DIR --caption_column="additional_feature" \
--resolution=512 --random_flip \
--train_batch_size=1 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--output_dir=$OUTPUT_DIR \
--validation_prompt=None --report_to="wandb"

Tracing unauthorized data usages.

First, generate a set of samples using the inspected model:

export MODEL_PATH="model_pokemon-blip-captions_p1.0_wanet_unconditional_s2.0_k128" \
export SAVE_PATH="./generated_imgs_pokemon-blip-captions_p1.0_wanet_unconditional_s2.0_k128/" \

CUDA_VISIBLE_DEVICES=0 python generate.py --model_path $MODEL_PATH --save_path  $SAVE_PATH

Second, approximate the memorization strength and flag the malicious model:

Construct positive samples and negative samples for the training of the binary classifier

python coating.py --p 1.0 --target_type wanet --unconditional --wanet_s 2

python coating.py --p 0.0 --target_type none

Train binary classifier and approximate the memorization strength

export ORI_DIR="/pokemon-blip-captions_p0.0_none/train/" \
export COATED_DIR="./pokemon-blip-captions_p1.0_wanet_unconditional_s2.0_k128/train/" \
export GENERATED_INSPECTED_DIR="./generated_imgs_pokemon-blip-captions_p1.0_wanet_unconditional_s2.0_k128/ " \

CUDA_VISIBLE_DEVICES=0 python binary_classifier.py --ori_dir $ORI_DIR \
--coated_dir $COATED_DIR \
--generated_inspected_dir $GENERATED_INSPECTED_DIR

Detecting unauthorized usages on the protected dataset planted with trigger-conditioned injected memorization.

Planting trigger-conditioned injected memorization into model:

python coating.py --p 0.2 --target_type wanet --wanet_s 1 --remove_eval

Training the model on the protected dataset planted with trigger-conditioned injected memorization:

export MODEL_NAME="CompVis/stable-diffusion-v1-4" \
export TRAIN_DATA_DIR="./pokemon-blip-captions_p0.2_wanet_s1.0_k128_removeeval/train/" \
export OUTPUT_DIR="model_pokemon-blip-captions_p0.2_wanet_s1.0_k128" \

CUDA_VISIBLE_DEVICES=0 accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DATA_DIR --caption_column="additional_feature" \
--resolution=512 --random_flip \
--train_batch_size=1 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--output_dir=$OUTPUT_DIR \
--validation_prompt=None --report_to="wandb"

Tracing unauthorized data usages.

First, generate a set of samples using the inspected model:

export MODEL_PATH="model_pokemon-blip-captions_p0.2_wanet_s1.0_k128" \
export SAVE_PATH="./generated_imgs_pokemon-blip-captions_p0.2_wanet_s1.0_k128/" \

CUDA_VISIBLE_DEVICES=0 python generate.py --model_path $MODEL_PATH --save_path  $SAVE_PATH

Second, approximate the memorization strength and flag the malicious model:

Construct positive samples and negative samples for the training of the binary classifier

python coating.py --p 1.0 --target_type wanet --unconditional --wanet_s 1

python coating.py --p 0.0 --target_type none

Train binary classifier and approximate the memorization strength

export ORI_DIR="/pokemon-blip-captions_p0.0_none/train/" \
export COATED_DIR="./pokemon-blip-captions_p1.0_wanet_unconditional_s1.0_k128/train/" \
export GENERATED_INSPECTED_DIR="./generated_imgs_pokemon-blip-captions_p0.2_wanet_s1.0_k128/ " \

CUDA_VISIBLE_DEVICES=0 python binary_classifier.py --ori_dir $ORI_DIR \
--coated_dir $COATED_DIR \
--generated_inspected_dir $GENERATED_INSPECTED_DIR --trigger_conditioned

Running experiments on unprotected dataset.

Get unprotected dataset:

python coating.py --p 0.0 --target_type none --remove_eval

Training the model on the unprotected dataset:

export MODEL_NAME="CompVis/stable-diffusion-v1-4" \
export TRAIN_DATA_DIR="./pokemon-blip-captions_p0.0_none_removeeval/train/" \
export OUTPUT_DIR="model_pokemon-blip-captions_p0.0_none" \

CUDA_VISIBLE_DEVICES=0 accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DATA_DIR --caption_column="additional_feature" \
--resolution=512 --random_flip \
--train_batch_size=1 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--output_dir=$OUTPUT_DIR \
--validation_prompt=None --report_to="wandb"

Tracing unauthorized data usages.

First, generate a set of samples using the inspected model:

export MODEL_PATH="model_pokemon-blip-captions_p0.0_none" \
export SAVE_PATH="./generated_imgs_pokemon-blip-captions_p0.0_none/" \

CUDA_VISIBLE_DEVICES=0 python generate.py --model_path $MODEL_PATH --save_path  $SAVE_PATH

Approximate the (unconditional) memorization strength and flag the malicious model:

Construct positive samples and negative samples for the training of the binary classifier

python coating.py --p 1.0 --target_type wanet --unconditional --wanet_s 1

python coating.py --p 0.0 --target_type none

Train binary classifier and approximate the memorization strength

export ORI_DIR="/pokemon-blip-captions_p0.0_none/train/" \
export COATED_DIR="./pokemon-blip-captions_p1.0_wanet_unconditional_s1.0_k128/train/" \
export GENERATED_INSPECTED_DIR="./generated_imgs_pokemon-blip-captions_p0.0_none/ " \

CUDA_VISIBLE_DEVICES=0 python binary_classifier.py --ori_dir $ORI_DIR \
--coated_dir $COATED_DIR \
--generated_inspected_dir $GENERATED_INSPECTED_DIR

Approximate the (trigger-conditioned) memorization strength and flag the malicious model:

Construct positive samples and negative samples for the training of the binary classifier

python coating.py --p 1.0 --target_type wanet --unconditional --wanet_s 2

python coating.py --p 0.0 --target_type none

Train binary classifier and approximate the memorization strength

export ORI_DIR="/pokemon-blip-captions_p0.0_none/train/" \
export COATED_DIR="./pokemon-blip-captions_p1.0_wanet_unconditional_s2.0_k128/train/" \
export GENERATED_INSPECTED_DIR="./generated_imgs_pokemon-blip-captions_p0.0_none/"\

CUDA_VISIBLE_DEVICES=0 python binary_classifier.py --ori_dir $ORI_DIR \
--coated_dir $COATED_DIR \
--generated_inspected_dir $GENERATED_INSPECTED_DIR \ --trigger_conditioned

Acknowledgement

Part of the code is modifed based on https://github.com/huggingface/diffusers/tree/main/examples/text_to_image.

Cite this work

You are encouraged to cite the following paper if you use the repo for academic research.

@inproceedings{wang2024diagnosis,
  title={DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models},
  author={Wang, Zhenting and Chen, Chen and Lyu, Lingjuan and Metaxas, Dimitris and Ma, Shiqing},
  booktitle={International Conference on Learning Representations},
  year={2024}
}

Raman1121 / DIAGNOSIS

DIAGNOSIS

Environment

Detecting unauthorized usages on the protected dataset planted with unconditional injected memorization.

Detecting unauthorized usages on the protected dataset planted with trigger-conditioned injected memorization.

Running experiments on unprotected dataset.

Acknowledgement

Cite this work

About

Languages