JA-POLS

Authors: Irit Chelly, Vlad Winter, Dor Litvak, David Rosen, and Oren Freifeld.

This code repository corresponds to our CVPR '20 paper: JA-POLS: a Moving-camera Background Model via Joint Alignment and Partially-overlapping Local Subspaces. JA-POLS is a novel 2D-based method for unsupervised learning of a moving-camera background model, which is highly scalable and allows for relatively-free camera motion.

A detailed description of our method and more example results can be found here:
Paper
Supplemental Material
Example Results

Acknowledgements:
This work was partially funded by the Shulamit Aloni Scholarship from Israel's Ministry of Technology and Science, and by BGU's Hi-Tech Scholarship.

Requirements

Python: most of the code runs in python using the following packages: numpy, matlab.engine, scipy, tensorflow, torch, openCV, imageio, scikit-image, and other common python packages.
MATLAB (for the SE-Sync part)
C++: in case you are choosing the TGA mathod for learning the local subspaces (see module 2 below), please follow the TGA requirements. All steps should be performed in the TGA folder: 2_learning\BG\TGA-PCA.

For a minimal working example, use the Tennis sequence (the input images are already located in the input folder in this repository).

Installation

Instructions and Description

JA-POLS method includes 3 phases that run in separate modules:

Joint alignment: align all input images to a common coordinate system
Learning of two tasks:
- Partially-overlapping Local Subspaces (the background)
- Alignment prediction
BG/FG separation for a (previously-unseen) input frame

Configuration parameters: the file config.py includes all required parameters for the 3 modules.

Before start running the code, insert the following config parameter:

Your local path to the JA-POLS folder:

paths = dict(
    my_path = '/PATH_TO_JAPOLS_CODE/JA-POLS/',
)

The size of a single input frame (height, width, depth):

images = dict(
    img_sz = (250, 420, 3),
)

All 3 modules should run from the source folder JA-POLS/.

Module 1: Joint Alignment

Code:
Main function: 1_joint_alignment/main_joint_alignment.py

Input:
A video or a sequence of images, that the BG model will be learned from.
The video or the images should be located in input/learning/video or input/learning/images respectively.

Output:

data/final_AFFINE_trans.npy: affine transformations for all input images.
(In this file, record i contains the affine transformation (6-parameters vector) that is associated with input image i).

Required params in config.py:
Data type (video or a sequence of images), and relevant info about the input data:

se = dict(
    data_type = 'images',  # choose from: ['images', 'video']
    video_name = 'jitter.mp4',  # relevant when data_type = 'video'
    img_type = '*.png',  # relevant when data_type = 'images'
)

Parameters for the spatial transformer net (when estimating the affine transformations):

stn = dict(
    device = '/gpu:0',   # choose from: ['/gpu:0', '/gpu:1', '/cpu:0']
    load_model = False,  # 'False' when learning a model from scratch, 'True' when using a trained network's model
    iter_per_epoch = 2000, # number of iterations 
    batch_size = 10,
)

The rest of the parameters can (optionally) remain with the current configuration.

Description:
Here we solve a joint-alignment problem:

High-level steps:

Compute relative transformations for pairs of input images (according to the graph)
Run SE-Sync framework and get absolute SE transformations for each frame
Transform images according to the absolute SE transformations
Estimate residual affine transformations by optimizing the above loss function using Spatial Transformer Network (STN).
End-up with absolute affine transformations for each of the input images

Module 2: Learning

Code location (main function):
Main function: 2_learning/main_learning.py

Input:
Files that were prepared in module 1:

data/final_AFFINE_trans.npy
data/imgs.npy
data/imgs_big_embd.npy

Output:

data/subspaces/: local subspaces for the background learning.
2_learning/Alignment/models/best_model.pt: model of a trained net for the alignment prediction.

Required params in config.py:
Local-subspaces learning:
Method type of the background learning algorithm, that will run on each local domain:

pols = dict(
    method_type = 'PRPCA',  # choose from: [PCA / RPCA-CANDES / TGA / PRPCA]
)

The rest of the parameters can (optionally) remain with the current configuration.

Alignment-prediction learning:
Parameters for the regressor net (when learning a map between images and transformations):

regress_trans = dict(
    load_model = False,  # 'False' when learning a model from scratch, 'True' when using a trained network's model
    gpu_num = 0,  # number of gpu to use (in case there is more than one)
)

The rest of the parameters can (optionally) remain with the current configuration.

Description:
Here we learn two tasks, based on the affine transformations that were learned in module 1:

Module 3: Background/Foreground Separation

Code:
Main function: 3_bg_separation/main_bg_separation.py

Input:
A video or a sequence of test images for BG/FG separation.
The video or the images should be located in input/test/video or input/test/images respectively.

Output:

output/bg/: background for each test image.
output/fg/: foreground for each test image.
output/img/: original test images.

Required params in config.py:
Data type (video or a sequence of test images), and relevant info about the input data:

bg_tool = dict(
    data_type = 'images',  # choose from: ['images', 'video']
    video_name = 'jitter.mp4',  # relevant when data_type = 'video'
    img_type = '*.png',  # relevant when data_type = 'images'
)

Indicate which test images to process: 'all' (all test data), 'subsequence' (subsequence of the image list), or 'idx_list' (a list of specific frame indices (0-based))..
If choosing 'subsequence', insert relevant info in "start_frame" and "num_of_frames".
If choosing 'idx_list', insert a list of indices in "idx_list".

bg_tool = dict(
    which_test_frames='idx_list',  # choose from: ['all', 'subsequence', 'idx_list']
    start_frame=0,
    num_of_frames=20,
    idx_list=(2,15,39),
)

Indicate whether or not to use the ground-truth transformations, in case your process images from the original video.
When processing learning images, insert True.
When processing unseen images, insert False.

bg_tool = dict(
    use_gt_theta = True,
)

The rest of the parameters can (optionally) remain with the current configuration.

Copyright and License

This software is released under the MIT License (included with the software). Note, however, that if you are using this code (and/or the results of running it) to support any form of publication (e.g., a book, a journal paper, a conference paper, a patent application, etc.) then we request you will cite our paper:

 @inproceedings{chelly2020ja,
  title={JA-POLS: a Moving-camera Background Model via Joint Alignment and Partially-overlapping Local Subspaces},
  author={Chelly, Irit and Winter, Vlad and Litvak, Dor and Rosen, David and Freifeld, Oren},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={12585--12594},
  year={2020}
}

BGU-CS-VIL / JA-POLS

JA-POLS

Requirements

Installation

Instructions and Description

Module 1: Joint Alignment

Module 2: Learning

Module 3: Background/Foreground Separation

Copyright and License

About

Languages