PAVS10K
Figure 1: An example of our PAVS10K where coarse-to-fine annotations are provided, based on a guidance of fixations acquired from subjective experiments conducted by multiple (N) subjects wearing Head-Mounted Displays (HMDs) and headphones. Each (e.g., fk, fl and fn, where random integral values {k, l, n} ∈ [1, T ]) of the total equirectangular (ER) video frames T of the sequence “Speaking”(Super-class)-“Brothers”(sub-class) are manually labeled with both object-level and instance-level pixel-wise masks. According to the features of defined salient objects within each of the sequences, multiple attributes, e.g., “multiple objects” (MO), “competing sounds” (CS), “geometrical distortion” (GD), “motion blur” (MB), “occlusions” (OC) and “low resolution” (LR) are further annotated to enable detailed analysis for PAV-SOD modeling.
Figure 2: Summary of widely used salient object detection (SOD)/video object segmentation (VOS) datasets and PAVS10K. #Img: The number of images/video frames. #GT: The number of object-level pixel-wise masks (ground truth for SOD). Pub. = Publication. Obj.-Level = Object-Level Labels. Ins.-Level = Instance-Level Labels. Fix. GT = Fixation Maps. † denotes equirectangular images.
Figure 3: Examples of challenging attributes on equirectangular images from our PAVS10K, with instance-level ground truth and fixations as annotation guidance. {𝑓𝑘, 𝑓𝑙, 𝑓𝑛} denote random frames of a given video.
Figure 4: Statistics of the proposed PAVS10K. (a) Super-/sub-category information. (b) Instance density (labeled frames per sequence) of each sub-class. (c) Sound sources of PAVS10K scenes, such as musical instruments, human instances and animals.
Benchmark Models
No. | Year | Pub. | Title | Links |
---|---|---|---|---|
01 | 2019 | CVPR | Cascaded Partial Decoder for Fast and Accurate Salient Object Detection | Paper/Code |
02 | 2019 | CVPR | See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks | Paper/Code |
03 | 2019 | ICCV | Stacked Cross Refinement Network for Edge-Aware Salient Object Detection | Paper/Code |
04 | 2019 | ICCV | Semi-Supervised Video Salient Object Detection Using Pseudo-Labels | Paper/Code |
05 | 2020 | AAAI | F³Net: Fusion, Feedback and Focus for Salient Object Detection | Paper/Code |
06 | 2020 | AAAI | Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection | Paper/Code |
07 | 2020 | CVPR | Multi-scale Interactive Network for Salient Object Detection | Paper/Code |
08 | 2020 | CVPR | Label Decoupling Framework for Salient Object Detection | Paper/Code |
09 | 2020 | ECCV | Highly Efficient Salient Object Detection with 100K Parameters | Paper/Code |
10 | 2020 | ECCV | Suppress and Balance: A Simple Gated Network for Salient Object Detection | Paper/Code |
11 | 2020 | BMVC | Making a Case for 3D Convolutions for Object Segmentation in Videos | Paper/Code |
12 | 2020 | SPL | FANet: Features Adaptation Network for 360° Omnidirectional Salient Object Detection | Paper/Code |
13 | 2021 | CVPR | Reciprocal Transformations for Unsupervised Video Object Segmentation | Paper/Code |
CAV-Net
The codes are available at src.
The pre-trained models can be downloaded at Google Drive.
Dataset Downloads
The whole object-/instance-level ground truth with default split can be downloaded from Google Drive.
The videos (with ambisonics) with default split can be downloaded from Google Drive.
The head movement and eye fixation data can be downloaded from Google Drive
To generate video frames, please refer to video_to_frames.py.
To get access to raw videos on YouTube, please refer to video_seq_link.
Note: The PAVS10K dataset does not own the copyright of videos. Only researchers and educators who wish to use the videos for non-commercial researches and/or educational purposes, have access to PAVS10K.
Note: If you find our work is helpful, please cite this page in your research paper. Thanks.
Contact
yi.panoash@gmail.com or fang-yi.chao@tcd.ie (for details of head movement and eye fixation data).