There are 5 repositories under audio-visual topic.
A curated list of different papers and datasets in various areas of audio-visual processing
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
Implementation of "EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition, ICCV, 2019" in PyTorch
This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Human Emotion Understanding using multimodal dataset.
🎙 Generator waveform paths for SVG 🎶
An audio visualizer for React. Provides separate components to visualize both live audio and audio blobs.
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
Programmatic minimalistic audio visualizations.
Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)
Transformer-based online speech recognition system with TensorFlow 2
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
Accepted by TMM 2022
Efficient synchronization from sparse cues
Towards Intelligibility-Oriented Audio-Visual Speech Enhancement
Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)
Internet Radio Player with an Audio Visualizer made using VueJS, Vuetify & Howler.JS frameworks. The Player has a bunch of radio stations. Check out the demo below.
Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
Segment-level autoencoders for multimodal representation
Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.
Attention-based Temporal Binding Network
Urban Sound & Sight dataset and baseline
Library to convert image files to audio files and vice versa
Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024
a standardized way to record and store the finding of an inspection of an analogue film in order to document the state at the moment of digitization
Respository for BFI National Archive open source preservation workflow scripts
🎵 Tutorial showing how to use audio analysers to update a WebGL scene 🔊