VisionLearningGroup / JEDDi-Net

Implementation for "Joint Event Detection and Description in Continuous Video Streams"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Joint Event Detection and Description in Continuous Video Streams

Code released by Huijuan Xu (Boston University).

Introduction

We present the Joint Event Detection and Description Network (JEDDi-Net) that solves the dense captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and transcribes the event proposals into captions with the consideration of visual and language context.

License

JEDDi-Net is released under the MIT License (refer to the LICENSE file for details).

Citing JEDDi-Net

If you find JEDDi-Net useful in your research, please consider citing:

@article{xu2019joint,
title={Joint Event Detection and Description in Continuous Video Streams},
author={Xu, Huijuan and Li, Boyang and Ramanishka, Vasili and Sigal, Leonid and Saenko, Kate},
journal={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2019}
}

Contents

  1. Installation
  2. Preparation
  3. Training
  4. Testing

Installation:

  1. Clone the JEDDi-Net repository.

    git clone --recursive git@github.com:VisionLearningGroup/JEDDi-Net.git
  2. Build Caffe3d with pycaffe (see: Caffe installation instructions).

    Note: Caffe must be built with Python support!

cd ./caffe3d

# If have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
  1. Build JEDDi-Net lib folder.

    cd ./lib    
    make

Preparation:

  1. Download the ground truth annatations and videos in ActivityNet Captions dataset.

  2. Extract frames from downloaded videos in 25 fps.

  3. Generate the pickle data for training and testing JEDDi-Net model.

    cd ./preprocess
    # generate training data
    python generate_train_roidb_sorted.py
    # generate validation data
    python generate_val_roidb.py  

Training:

  1. Download the separately-trained segment proposal network(SPN) and captioning models ./pretrain/ .

  2. In JEDDi-Net root folder, run:

    bash ./experiments/denseCap_jeddiNet_end2end/script_train.sh

Testing:

  1. Download one sample JEDDi-Net model to ./snapshot/ .

    One JEDDi-Net model on ActivityNet Captions dataset is provided in: caffemodel .

    The provided JEDDi-Net model has the METEOR score ~8.58% on the validation set.

  2. In JEDDi-Net root folder, generate the prediction log file on the validation set.

    bash ./experiments/denseCap_jeddiNet_end2end/test/script_test.sh 
  3. Generate the results.json file from the prediction log file.

    cd ./experiments/denseCap_jeddiNet_end2end/test/
    bash bash.sh
  4. Follow the evaluation code to get the evaluation results.

About

Implementation for "Joint Event Detection and Description in Continuous Video Streams"

License:MIT License


Languages

Language:Jupyter Notebook 56.6%Language:C++ 33.1%Language:Python 5.4%Language:Cuda 2.6%Language:CMake 1.1%Language:Shell 0.4%Language:MATLAB 0.4%Language:Makefile 0.3%Language:CSS 0.1%Language:HTML 0.1%Language:Dockerfile 0.0%