anujdutt9 / ESPCN

Paper Implementation - "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network".

Home Page:https://arxiv.org/pdf/1609.05158.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ESPCN

This repository is an attempt at implementing the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network using PyTorch.

Setup

Please setup your training environemnt by installing the requirements using:

$ pip install -r requirements.txt

Training

To run the model training, use the following command:

$ python train.py -t ./dataset/train -v ./dataset/val -o ./assets/models/

This will train the model and save the model's weights as state_dict() to the assets/models/ folder.

Here's a GIF of model predictions during training:

Alt Text

If you see the face carefully, as the model learns and the PSNR value increases, the boxes/lines from the image and face start to disappear.

Inference

This repository comes with pre-trained model as well. To directly run inference on images, use the following command:

$ python infer.py -w ./assets/models/best.pth -i ./dataset/test/face.png -o ./datasets

NOTE: The pre-trained model provided with this repository is trained with a scaling factor of 3. Hence, the inference will up-scale the input image by a factor of 3. To use a different scaling factor, please train the model with apprropriate scaling factor value.

Export

The export.py script provide five options for exporting the model - ONNX, CoreML, TensorFlow, TFLite and TF.js. For example to convert the trained model to TFLite format, use the following command:

$ python export.py -i ./assets/models/best.pth -o ./assets/models -f TFLite

Model Statistics

The following are the model statistics for an input image of shape [1, 1, 224, 224]:

Input size (MB): 0.20
Forward/backward pass size (MB): 42.15
Params size (MB): 0.09
Estimated Total Size (MB): 42.44
------------------------
Total params: 22,729
------------------------
Total memory: 40.20MB
Total MAdd: 2.27GMAdd
Total Flops: 1.14GFlops
Total MemR+W: 38.75MB

Results

The following are some images comparing the original High Resolution image vs Image up-sampled using Bi-cubic up-sampling vs Super Resolution using this model. Note that the up-scaling factor for bi-cubic up-sampling and the super-resolution model is set to 3. For using a different up-scaling factor, please re-train the model.

Original BICUBIC x3 ESPCN x3 (PSNR: 24.14 dB)
Original BICUBIC x3 ESPCN x3 (PSNR: 32.40 dB)
Original BICUBIC x3 ESPCN x3 (PSNR: 28.36 dB)
Original BICUBIC x3 ESPCN x3 (PSNR: 28.65 dB)

NOTE: PSNR was calculated on Y channel images.

Contributions

If you find a bug or have any other issue/question, please create a GitHub issue, or submit a pull request.

Credits

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

@article{DBLP:journals/corr/ShiCHTABRW16,
  author    = {Wenzhe Shi and
               Jose Caballero and
               Ferenc Husz{\'{a}}r and
               Johannes Totz and
               Andrew P. Aitken and
               Rob Bishop and
               Daniel Rueckert and
               Zehan Wang},
  title     = {Real-Time Single Image and Video Super-Resolution Using an Efficient
               Sub-Pixel Convolutional Neural Network},
  journal   = {CoRR},
  volume    = {abs/1609.05158},
  year      = {2016},
  url       = {http://arxiv.org/abs/1609.05158},
  archivePrefix = {arXiv},
  eprint    = {1609.05158},
  timestamp = {Mon, 13 Aug 2018 16:47:09 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/ShiCHTABRW16.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Is the deconvolution layer the same as a convolutional layer? A note on Real­Time Single Image and Video Super­Resolution Using an Efficient Sub­Pixel Convolutional Neural Network.

@article{article,
author = {Shi, Wenzhe and 
          Caballero, Jose and 
          Theis, Lucas and 
          Huszar, Ferenc and 
          Aitken, Andrew and 
          Ledig, Christian and 
          Wang, Zehan},
year = {2016},
month = {09},
pages = {},
title = {Is the deconvolution layer the same as a convolutional layer?}
}

About

Paper Implementation - "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network".

https://arxiv.org/pdf/1609.05158.pdf

License:MIT License


Languages

Language:Python 100.0%