Piyush Bagad's repositories
TestOfTime
Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time
rotation-equivariant-lfm
Rotation equivariance meets local feature matching
dino-local
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
new-machine-setup-scripts
Bunch of scripts useful to add when starting on a new machine
NLP-CS671A
Course files for CS671A - Natural Language Processing
sound-guided-semantic-image-manipulation
Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)
Sound2Scene
Clone of the Sound2Scene repo. Need to train on pouring water images.
TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou
transparent-liquid-segmentation
We build a novel self-supervised segmentation pipeline to segment transparent liquids (clear water) placed inside transparent containers.
audio_codec_tests
Tests for codec artefacts in stored audio samples.
bpiyush.github.io
A portfolio page
ddsp-pytorch
Implementation of DDSP (PyTorch), Differentiable Digital Signal Processing (ICLR 2020)
digan
Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks (ICLR 2022).
InternVideo
Video Foundation Models & Data for Multimodal Understanding
LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
PhysParamInference
Clone of the WACV2023 paper. Adaptation on pouring water.
TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
VideoMAE-ssl
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
ViLMA
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
YouTube-scrapper-tutorial
Tutorial to scrape YouTube video for research purposes.