stoneMo / SLAVC

Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Official codebase for SLAVC.

SLAVC is a new approach for weakly-supervised visual sound source localization to identify negatives and solve significant overfitting problems.

A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo, Pedro Morgado
NeurIPS 2022.

SLAVC Illustration

Environment

To setup the environment, please simply run

pip install -r requirements.txt

Datasets

Flickr-SoundNet

Data can be downloaded from Learning to localize sound sources

VGG-Sound Source

Data can be downloaded from Localizing Visual Sounds the Hard Way

Extended Flickr-SoundNet

Data can be downloaded from Extended-Flickr-SoundNet

Extended VGG-Sound Source

Data can be downloaded from Extended-VGG-Sound Source

Model Zoo

We release MoVSL model pre-trained on VGG-Sound 144k data and scripts on reproducing results on Extended Flickr-SoundNet and Extended VGG-Sound Source benchmarks.

Method Train Set Test Set AP max-F1 Precision url Train Test
SLAVC VGG-Sound 144k Extended Flickr-SoundNet 51.63 59.10 83.60 model script script
SLAVC VGG-Sound 144k Extended VGG-SS 32.95 40.00 37.79 model script script

Train

For training an SLAVC model, please run

python train.py --multiprocessing_distributed \
    --train_data_path /path/to/VGGSound-all/ \
    --test_data_path /path/to/Flickr-SoundNet/ \
    --test_gt_path /path/to/Flickr-SoundNet/Annotations/ \
    --experiment_name vggss144k_slavc \
    --model 'slavc' \
    --trainset 'vggss_144k' \
    --testset 'flickr' \
    --epochs 20 \
    --batch_size 128 \
    --init_lr 0.0001 \
    --use_momentum --use_mom_eval \
    --m_img 0.999 --m_aud 0.999 \
    --dropout_img 0.9 --dropout_aud 0

Test

For testing and visualization, simply run

python test.py --test_data_path /path/to/Extended-VGGSound-test/ \
    --model_dir checkpoints \
    --experiment_name vggss144k_slavc \
    --testset 'vggss_plus_silent' \
    --alpha 0.9 \
    --relative_prediction

Citation

If you find this repository useful, please cite our paper:

@inproceedings{mo2022SLAVC,
  title={A Closer Look at Weakly-Supervised Audio-Visual Source Localization},
  author={Mo, Shentong and Morgado, Pedro},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

About

Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)

License:Apache License 2.0


Languages

Language:Python 98.6%Language:Shell 1.4%