Max Bain's repositories
frozen-in-time
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
video-transformers
Implementations of Transformers for Video
CondensedMovies-chall
Condensed Movies Challenge 2021
clip-hitchhiker
A Clip-Hitchiker's Guide to Long Video Retrieval [Arxiv 2022]
pytorch-multi-label-classifier
A pytorch implemented classifier for Multiple-Label classification
SimpleDiarization
Simple Diarization model
collaborative-experts
Video embeddings for retrieval - code for the paper "Use What You Have: Video retrieval using representations from collaborative experts"
conceptual-12m
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
primate-behaviour-recognition
Automated Audiovisual Behaviour Recognition in Wild Primates
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more
slurm_gpustat
A simple command line tool to show GPU usage on a SLURM cluster
torchvggish
Pytorch port of Google Research's VGGish model used for extracting audio features.
video2dataset
Easily create large video dataset from video urls
bert-as-service
Mapping a variable-length sentence to a fixed-length vector using BERT model
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
video_features
Extract video features from raw videos using multiple GPUs. We support RAFT and PWC flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
whisper-asr-webservice
OpenAI Whisper ASR Webservice API