SR3

Reimplementation of 4x SR3 https://arxiv.org/abs/2104.07636

The UNet structure is almost same as the vanilla DDPM, except that self-attention is performed at the last depth and the depth right before the last depth, group normalization is performed on total 8 groups instead of 32 groups, and the linear scale of embedding generation module is replaced from 10,000 to 5,000. As mentioned in the paper, gamma value is sampled between two alpha values at t-1 and t with a unifrom probability distribution, and the square rooted value of gamma is directly inserted to the embedding generation module.

Result

64x64 to 256x256 Model

A. Settings

Tag	Setting
Base Channel	56
Train Batch Size	4
Train Iterations	500K
Trian Data	DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images
Validation Data	DIV2K Validation Set
Test Data	Flickr2K Train Set from 1 to 1000 images
Train Data Augmentation	Random Crop, Random Flip, Random Rotation
Test Data Augmentation	Centor Crop
Train Learning Rate Schedule	Cosine Annealing Schedule from 1e-5 to 1e-7
Train Beta Scehdule	Linear Schedule from 1e-4 to 0.005
Sample Gamma Schedule	Linear Schedule from 1e-4 to 0.1
Train Steps	1000
Sample Steps	100

B. Scores

Dataset	IS (Mean, Std.)	FID	PSNR	SSIM
centor crop 64x64 to 256x256	(12.829, 0.992)	3.642	23.185	0.564
centor crop 256x256 to 1024x1024	(21.305, 2.290)	0.312	23.819	0.617

Note that this model does not train on 256x256 to 1024x1024.

Inception Score shows low values as cropped images are hard to recognize as an object. As crop size increases, Inception Score also increases.

C. Samples

Note that the below LR images are upsampled images by using bicubic interpolation.

Validation (64x64 to 256x256)

Tag	Image
LR
Sample
HR

Test (64x64 to 256x256)

Tag	Image
LR
Sample
HR

Test (256x256 to 1024x1024)

Tag	Image
LR
Sample
HR

32x32 to 128x128 Model

Dataset	IS (Mean, Std.)	FID	PSNR	SSIM
centor crop 32x32 to 128x128	(7.159, 0.437)	8.177	23.609	0.563

A. Settings

Tag	Setting
Base Channel	64
Train Batch Size	12
Train Iterations	500K
Trian Data	DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images
Validation Data	DIV2K Validation Set
Test Data	Flickr2K Train Set from 1 to 1000 images
Train Data Augmentation	Random Crop, Random Flip, Random Rotation
Test Data Augmentation	Centor Crop
Train Learning Rate Schedule	Cosine Annealing Schedule from 1e-5 to 1e-7
Train Beta Scehdule	Linear Schedule from 1e-4 to 0.005
Sample Gamma Schedule	Linear Schedule from 1e-6 to 0.05
Train Steps	1000
Sample Steps	100

C. Samples

Note that the below LR images are upsampled images by using bicubic interpolation.

Validation (32x32 to 128x128)

Tag	Image
LR
Sample
HR

Test (32x32 to 128x128)

Tag	Image
LR
Sample
HR

About

Super Resolution with Diffusion Probabilistic Model

ddpm pytorch sr3 super-resolution

Languages

Language:Python 100.0%