About

This project uses GANs (generative adversarial networks) to add color to black and white images. For each such image, the generator network (G) receives its black and white version and outputs a full RGB version of the image (i.e. the black and white image with color added to it). That RGB version is then rated (in regards to its quality) by the discriminator (D). The quality measure is backpropagated through D and then through G. Thereby G can learn to correctly colorize images. The architectures used are modifications of the DCGAN schema. See this blog post for an alternative version, which uses standard convnets (i.e. no GANs) with VGG features.

Key results:

If a dataset of images can be generated by a GAN, then a GAN can also learn to add colors to it.
The task of adding colors seems to be a bit easier than the full generation of images.
G did not learn to add colors to rather rare and small elements (e.g. when coloring images of christmas trees it didn't add color to presents below the trees, small baubles or clothes of people in the image). This might partly be a limitation of the architecture, which uses pooling layers in G (hence small elements might get lost).
G did not learn to correctly add colors to datasets with high variance (heterogeneous collections of images). It would resort to mostly just adding one or two colors everywhere.
I experimented with using VGG features but didn't have much success with those. G didn't seem to learn more than without VGG features. My tests were limited though due to hardware constraints (VGG + G + D = three big networks in memory). It did not try the hypercolumn that was used in the previously mentioned blog post.
Producing UV values in G and combining them with Y to an YUV image (which is then fed into D) failed. G just wouldn't learn anything. G had to output full RGB images to learn successfully. Not sure if there was a bug somewhere or if there's a good reason for that effect.

Images

Colorizers were trained on multiple image datasets which were reused from previous projects. (I.e. multiple GANs were trained, not just one for all images. That's due to GANs not being very good at handling heterogeneous datasets.) Besides of the datasets shown below, the MSCOCO 2014 validation dataset was also used, but G failed to learn much on that one (it added mostly just 1-3 uniform colors per image), hence the results of that run are not shown.

Notes:

There were no splits into training and validation sets (partly due to laziness, partly because GANs in my experience basically never just memorize the training set). Note how the coloring in the images below is often different from the original coloring.
Training times were usually quite fast (<=2 hours per dataset).
All generated color images were a little bit blurry, probably because G generated full RGB images instead of just adding color (UV in YUV). As such, it has to learn to copy the Y channel information correctly while still adding colors.

Human faces

This dataset worked fairly well. Notice the image in the 10th row at the far right. G assigns a skin color to the microphone. Also notice how G usually doesn't add red color to the lips. Maybe they get lost during the pooling...?