bpiyush

[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

100

transparent-liquid-segmentation

We build a novel self-supervised segmentation pipeline to segment transparent liquids (clear water) placed inside transparent containers.

Language:Jupyter NotebookMIT100

VITATECS

100

audio_codec_tests

Tests for codec artefacts in stored audio samples.

Language:PythonMIT010

bpiyush

My personal introductory repository

020

bpiyush.github.io

A portfolio page

Language:JavaScriptMIT020

ddsp-pytorch

Implementation of DDSP (PyTorch), Differentiable Digital Signal Processing (ICLR 2020)

Language:Jupyter NotebookMIT000

digan

Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks (ICLR 2022).

Language:Python000

InternVideo

Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.0000

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonMIT000

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

BSD-3-Clause000

PhysParamInference

Clone of the WACV2023 paper. Adaptation on pouring water.

Language:PythonMIT000

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:Jupyter NotebookBSD-3-Clause000

unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Language:PythonMIT000

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonBSD-3-Clause000

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Language:PythonApache-2.0000

VideoMAE-ssl

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Language:PythonNOASSERTION000

ViLMA

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)

Language:PythonMIT000

VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonNOASSERTION000

YouTube-scrapper-tutorial

Tutorial to scrape YouTube video for research purposes.

Language:Jupyter NotebookMIT020