Andrew Rouditchenko's repositories
MUSIC_dataset
MUSIC Dataset from The Sound of Pixels (ECCV '18)
whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Sound-of-Pixels
Codebase for ECCV18 "The Sound of Pixels"
video_feature_extractor
Easy to use video deep features extractor
awesome-video-text-retrieval
A curated list of deep learning resources for video-text retrieval.
everything_at_once
Implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval" (CVPR 2022)
MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
MIT-6.058-Notebook
Notebooks collection I made for the MIT IAP Signals and Systems class
Spoken-ObjectNet
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset (Interspeech 2021)