face-competition / R_Unet

Video prediction using lstm and unet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

R_Unet

This project applies recurrent method upon U-net to perform pixel level video frame prediction.
Part of our result is published at IEEE GCCE 2020. pdf.

Brief introduction

Taking advantage of LSTM and U-net encode-decoder, we wish to be able in predicting next (n) frame(s).
Currently using a 2 layer LSTM network (V1) or convolution LSTM (V2) as RNN network applying on latent feature of U net
In our latest v4 model, we use convolutional LSTM in each level and take short cut in v2 out

On the other hand, we are now using v4_mask model to train mask, image input and mask, image prediction output
This model holds same structure as v4 but simply change output layer to output mask tensor.

Usage

  • configuration: config.json
  • parse configuration: class parse_arguement.py
  • training file: train.py
  • V1 model: R_Unet_v1.py
  • V2 model: R_Unet_ver_2.py
  • V4 model: R_Unet_ver_4.py
to train v1 model: python3 train.py config 
to train other model: python3 train_v2.py config 

Our Model Architecture

Current we are working on a better model using convolution lstm, name as runet_v2

  • model v1:
    alt_text

  • model v2:
    alt_text

  • model v4:
    alt_text



Some result

prediction: alt_text Ground truth: alt_text
mask prediction
prediction: alt_text Ground truth: alt_text

References

[1] Stochastic Adversarial Video Prediction, CVPR 2018
[2] High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks, NeurIPS 2019
[3] convLSTM - The convolution lstm framework used

Hsu Mu Chien, Watanabe Lab, Department of Fundamental Science and Engineering, Waseda University.

About

Video prediction using lstm and unet


Languages

Language:Python 100.0%