akbokha / image-colorization

Image colorization framework which facilitates the training and evaluation of various neural network architectures

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

image-colorization

This framework facilitates the training and evaluation of various deep neural networks for the task of image colorization. In particular, it offers the following colorization models, features and evaluation methods:

Colorization models

  • ResNet Colorization Network
  • Conditional GAN (CGAN)
  • U-Net

Evaluation methods and metrics

  • The Mean Squared Error (MSE)
  • The Mean LPIPS Perceptual Similarity (PS)
  • Semantic Interpretability (SI)

Prerequisites

The framework is implemented in Python (3.6) using PyTorch v1.0.1.

Please consult ./env/mlp_env.yml for a full list of the dependencies of the Conda environment that was used in the development of this framework. If Conda is used as as package and environment manager, one can use conda create --name myenv --file ./env/mlp_env.txt to recreate the aforementioned environment.

Structure

  • train.py - main entry point of the framework
  • src/options.py - parses arguments (e.g. task specification, model options)
  • src/main.py - set-up of task environment (e.g. models, dataset, evaluation method)
  • src/dataloaders.py - downloads and (sub)samples datasets, and provides iterators over the dataset elements.
  • src/models.py - contains the implementations of the model architectures
  • src/utils.py - contains various helper functions and classes
  • src/colorizer.py - trains and validates colorization models
  • src/classifier.py - trains and validates image-classification models (used for SI)
  • src/eval_gen - contains helper functions for the evaluation of model colorizations
  • src/eval_mse.py - evaluates colorizations by MSE
  • src/eval_ps.py - evaluates colorizations by the Mean LPIPS Perceptual Similarity (PS)
  • src/eval_si.py - evaluates colorizations by Semantic Interpretability (SI)

Usage

Training of models
python train.py [--option ...] where the options are:

option description type oneOf default
seed random seed int not applicable 0
task the task that should be executed str ['colorizer', 'classifier', 'eval-gen', 'eval-si', 'eval-ps', 'eval-mse'] 'colorizer'
experiment-name the name of the experiment str not applicable 'experiment_name'
model-name colorization model architecture that should be used str ['resnet', 'unet32', 'unet224', 'nazerigan32', 'nazerigan224' 'cgan'] 'resnet'
model-suffix colorization model name suffix str not applicable not applicable
model-path path for the pretrained models str not applicable './models'
dataset-name the dataset to use str ['placeholder', 'cifar10', 'places100', 'places205', 'places365'] 'placeholder'
dataset-root-path dataset root path str not applicable './data'
use-dataset-archive load dataset from TAR archive str2bool [True, False] False
output-root-path path for output (e.g. model weights, stats, colorizations) str not applicable './output'
max-epochs maximum number of epochs to train for int not applicable 5
train-batch-size training batch size int not applicable 100
val-batch-size validation batch size int not applicable 100
batch-output-frequency frequency with which to output batch statistics int not applicable 1
max-images maximum number of images from the validation set to be saved (per epoch) int not applicable 10
eval-root-path the root path for evaluation images str not applicable './eval'
eval-type the type of evaluation task to perform str ['original, 'grayscale', 'colorized'] 'original'

So one could for example train a cgan colorization model on the places365 dataset for 100 epochs by running:

python train.py \
  --experiment-name cgan_experiment001 \  
  --model-name cgan \        
  --dataset-name places365 \ 
  --max-epochs 100 \
  --train-batch-size 16 \
  --val-batch-size 16 \

Colorization Task

The task of colorizing a image can be considered a pixel-wise regression problem where the model input X is a 1xHxW tensor containing the pixels of the grayscale imageand the model output Y' a tensor of shape nxHxW that represents the predicted colorization information. Specifically, the task aims to discover a mapping F: XY' that plausibly predicts the colorization given the greyscale input.

The CIE L*a*b* colour space lends itself well to this task since the L channel depicts the brightness of the image (X above) and the image colour is fully captured in the remaining a and b channels (Y' above). The L*a*b* colour model also has the advantage of being inspired by human colour perception, meaning that distances in L*a*b* space can be expected to be correlated with changes in human colour perception. The final output colorized image is created by recombining the input L layer with the predicted a and b layers.

Colorization Models

Three colorization architectures are currently supported in the framework.

ResNet Colorization Network

This architecture consists of a CNN that starts out with a set of convolutional layers which aim to extract low-level and semantic features from the set of input images, inspired by how representations are learned in Learning Representations for Automatic Colorization. Based on the same idea as behind the VGG-16-Gray architecture in this paper, a modified version of the image classification network that is ResNet-18 is used as a means to learn representations from a set of images. In particular, the network is modified in such a way that it accepts greyscale images and in addition, the network is truncated to six layers. This set of layers is used to extract features from the images that are represented by their lightness channels. Subsequently a series of deconvolutional layers is applied to increase the spacial resolution of (i.e. 'upscale') the features. This up-scaling of features learned in a network is inspired by the 'upsampling' of features in the colorization network of Let There Be Color!

U-Net

This network is inspired by U-Net: Convolutional Networks for Biomedical Image Segmentation where direct connections are added between contracting and expanding layers of equal size to prevent the loss of spatial context of the original image throughout the layers. In Image Colorization with Generative Adversarial Networks an approach is proposed that uses such a network for colorization since the preservation of the original greyscale image is of particular importance to this task.

The network implemented in this paper has the same architecture as the one presented in the original U-Net paper (see image above), modified to take 224x224 inputs. Non-linearities are introduced by following convolutional and deconvolutional layers with leaky ReLUs with slope of 0.2. Furthermore batch normalisation is applied after every layer.

Conditional GAN (CGAN)

Recent research on image colorization has demonstrated the potential for using GAN architectures for image colorization tasks. One of the compelling aspects of using GANs is their ability to learn a loss function that is task-specific.

GANs consist of two networks: a generator and a discriminator. In the context of image colorization the generator’s task is to produce colorized images that are indistinguishable from real images. The discriminator’s task is to classify whether a sample came from the generator or from the original set of images. Traditionally, the generator is represented by a mapping , where z is a random noise variable which serves as the input of the generator. The discriminator is in a similar fashion represented by the mapping where x represents a real or synthetic input.

In the context of image colorization, the traditional GAN has to be modified into a Conditional GAN (CGAN) such that it takes as image data as input instead of (random) noise. More specifically, the CGAN will take as input greyscale data (i.e. images represented by their lightness channel L in the L*a*b* colour space) and generate colorized images. The discriminator will be trained on both the generated colorized images and full-colour ground truth images.

Formally, the main objective of the CGAN can be described by a single mini-max game problem:

Where represents the original image distribution. So informally, the generator tries to minimise the function by generating samples according to a mapping taking as input greyscale images x from the original data while the discriminator tries to maximise the same function by trying to distinguish between real images y from the original data distrbituion and generated samples .

In addition, the framework facilitates the addition of an L1-regularisation term in order to try to force the generator to produce results that are 'closer’ (i.e. more similar) to images from the original data distribution. Theoretically, this should preserve the structure of the ground-truth images and in addition prevent the generator from prodcuing images where it has given certain pixels or even whole image regions a random colour just to deceive the discriminator.

About

Image colorization framework which facilitates the training and evaluation of various neural network architectures

License:MIT License


Languages

Language:Jupyter Notebook 48.1%Language:Python 45.1%Language:Shell 6.8%