frapez1 / SPIC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Semantic-Preserving Image Coding based on Conditional Diffusion Models (SPIC)

Francesco Pezone, Osman Musa, Giuseppe Caire, Sergio Barbarossa


Semantic communication, rather than on a bit-by-bit recovery of the transmitted messages, focuses on the meaning and the goal of the communication itself. In this paper, we propose a novel semantic image coding scheme that preserves the semantic content of an image, while ensuring a good trade-off between coding rate and image quality. The proposed Semantic-Preserving Image Coding based on Conditional Diffusion Models (SPIC) transmitter encodes a Semantic Segmentation Map (SSM) and a low-resolution version of the image to be transmitted. The receiver then reconstructs a high-resolution image using a Denoising Diffusion Probabilistic Models (DDPM) doubly conditioned to the SSM and the low-resolution image. As shown by the numerical examples, compared to state-of-the-art (SOTA) approaches, the proposed SPIC exhibits a better balance between the conventional rate-distortion trade-off and the preservation of semantically-relevant features.


  • Operating System: Linux
  • Python Version: Python 3
  • Hardware: CPU or NVIDIA GPU with CUDA CuDNN

Dataset Preparation

To utilize the Cityscapes dataset:

  1. Follow the instructions provided by SPADE for downloading and preparing the dataset.
  2. Optionally, create a coarse version of the original images. By default, when applying the Semantic-Conditioned Super-Resolution Diffusion Model, the coarse images are generated using the BPG compression algorithm at compression level 35.

Training and Test


Download the dataset and initiate training with the following command:

python3 --data_dir ./data/cityscapes  --dataset_mode cityscapes --lr 1e-4 --batch_size 8 --attention_resolutions 32,16,8 --diffusion_steps 1000  --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --use_checkpoint True --num_classes 19  --class_cond True --large_size 128 --small_size 64 --no_instance True


To fine-tune the model:

python3 --data_dir ./data/cityscapes  --dataset_mode cityscapes --lr 1e-4 --batch_size 8 --attention_resolutions 32,16,8 --diffusion_steps 1000  --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --use_checkpoint True --num_classes 19 --class_cond True --large_size 128 --small_size 64 --no_instance True --resume_checkpoint ./Checkpoints/


To test the model:

python3 --data_dir ./data/cityscapes --dataset_mode cityscapes --batch_size 2 --attention_resolutions 32,16,8 --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --num_classes 19 --class_cond True --large_size 128 --small_size 64 --no_instance True --num_samples 60 --s 1.5 --max_iter_time 200 --timestep_respacing 100 --no_instance True --model_path ./Checkpoints/ --results_path ./Result/

Pre-trained Model

Access the pretrained model for the Cityscapes dataset via the link provided below:

Dataset Download link
Cityscapes Checkpoint


To evaluate the performance of the model, follow these steps to calculate the Bits Per Pixel (BPP), mean Intersection over Union (mIoU), and the Fréchet Inception Distance (FID):

Pre-requisites for Evaluation

Ensure you have the following components installed and set up:

  1. Semantic Segmentation Algorithm: For generating Semantic Segmentation Maps (SSM). In our case, we utilized InternImage.
  2. FLIF Compression Tool: FLIF, used for the lossless compression of the SSMs
  3. FID Calculation Tool: Install pytorch_fid to compute the Fréchet Inception Distance

Steps for Evaluation

  1. Generate Test Results: Before proceeding with the evaluation metrics, generate the test results using the model. Replace ./Checkpoints/ with the path to your model checkpoint and ./Result/ with the path where you want to save the results:
python3 --data_dir ./data/cityscapes --dataset_mode cityscapes --batch_size 2 --attention_resolutions 32,16,8 --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --num_classes 19 --class_cond True --large_size 128 --small_size 64 --no_instance True --num_samples 60 --s 1.5 --max_iter_time 200 --timestep_respacing 100 --no_instance True --model_path ./Checkpoints/ --results_path ./Result/
  1. Configure Evaluation Script: In the evaluate/ file, update the paths to point to your results and the InternImage folder. Specifically, replace path/to/Result and path/to/InternImage with the actual paths on your system.

  2. Run Evaluation Metrics: Execute the following command to calculate the BPP, mIoU, and FID for your generated images:

python evaluate/


Our code is developed based on guided-diffusion and Semantic Image Synthesis via Diffusion Models.



Language:Python 100.0%