This is the implementation of the paper Video Summarization by Learning from Unpaired Data(CVPR2019)
The FCSN architecture in above image is from Video Summarization Using Fully Convolutional Sequence Networks(ECCV2018)
- Ubuntu 18.04.1 LTS
- python 3.6.7
- numpy 1.15.4
- pytorch 1.1.0
- torchvision 0.3.0
- tqdm 4.32.1
- tensorboardX 1.6
$ cd && git clone https://github.com/pcshih/pytorch-VSLUD.git && cd pytorch-VSLUD
$ mkdir saved_models
3. download datasets.zip(this dataset is from here) into the project folder and unzip it
$ unzip datasets.zip
$ python3 training_set_preparation.py
$ python3 train.py
$ tensorboard --logdir runs --port 6006
Sorry for my poor coding, I am new to pytorch and deep learning.
The loss curves above are not reasonable during GAN training.
"The decoder of FCSN consists of several temporal deconvolution operations which produces a vector of prediction scores with the same length as the input video. Each score indicates the likelihood of the corresponding frame being a key frame or non-key frame. Based on these scores, we select k key frames to form the predicted summary video." -> found in the paper Video Summarization by Learning from Unpaired Data(CVPR2019)
I implement "we select k key frames to form the predicted summary video" by torch.index_select(input, dim, index, out=None)
Is the function torch.index_select(input, dim, index, out=None) differentiable during training?Is this the main problem to cause the training to death?
Please feel free to contact me via email (pcshih.cs07g@nctu.edu.tw) or disscuss on issues if you have any suggestions.
I am all gratitude.