jasongief

followers

following

stars

starriness's starred repositories

ShiArthur03

Language:MATLABGPL-3.0990600

Micro-Action

[TCSVT 2024] Official implementation of the paper: Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

Language:Jupyter Notebook1200

LFAV

Towards Long Form Audio-visual Video Understanding

Language:PythonMIT700

APL

APL for AVQA task

Language:Python200

CVPR2023-CMPAE

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Language:PythonMIT3200

MiGA2023_Track1

[IJCAI 2023]The Champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI2023.

Language:PythonApache-2.0800

CPSP

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Language:Python2100

OGM-GE_CVPR2022

The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)

Language:PythonMIT21300

DG-SCT

NeurIPS'2023 official implementation code

Language:Python5200

Non-local_pytorch

Implementation of Non-local Block.

Language:PythonApache-2.0156600

cross_modal_adaptation

Cross-modal few-shot adaptation with CLIP

Language:PythonMIT29400

LM4VisualEncoding

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

Language:PythonMIT21100

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Language:PythonNOASSERTION990400

TransnormerLLM

Official implementation of TransNormerLLM: A Faster and Better LLM

Language:PythonApache-2.022000

ALBEF

Code for ALBEF: a new vision-language pre-training method

Language:PythonBSD-3-Clause145800

up-to-date-Vision-Language-Models

Up-to-date Vision Language Models collection. Mainly focus on computer vision

1700

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookBSD-3-Clause925000

FNAC_AVL

[CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Language:Python2200

FAVDBench

[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description

Language:PythonApache-2.07200

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.01422700

UniDetector

Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".

Language:PythonApache-2.051700

EditAnything

Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)

Language:PythonApache-2.0323400

Awesome-Masked-Autoencoders

A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).

MIT73300

Transnormer

[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer

Language:Python5300

Tnn

[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling

Language:Python7000

awesome-self-supervised-learning

A curated list of awesome self-supervised methods

SSL-TIE

Official code for ACMMM2022 paper, "Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation"

Language:PythonMIT400

MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

Language:PythonNOASSERTION3454900

TemporalPyramidRouting

Temporal Pyramid Routing For Video Instance Segmentation-T-PAMI-2022

Language:PythonApache-2.02500

CaFo

[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Language:PythonMIT33500