starriness's starred repositories

Language:MATLABLicense:GPL-3.0Stargazers:9906Issues:0Issues:0

Micro-Action

[TCSVT 2024] Official implementation of the paper: Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

Language:Jupyter NotebookStargazers:12Issues:0Issues:0

LFAV

Towards Long Form Audio-visual Video Understanding

Language:PythonLicense:MITStargazers:7Issues:0Issues:0

APL

APL for AVQA task

Language:PythonStargazers:2Issues:0Issues:0

CVPR2023-CMPAE

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Language:PythonLicense:MITStargazers:32Issues:0Issues:0

MiGA2023_Track1

[IJCAI 2023]The Champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI2023.

Language:PythonLicense:Apache-2.0Stargazers:8Issues:0Issues:0

CPSP

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Language:PythonStargazers:21Issues:0Issues:0

OGM-GE_CVPR2022

The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)

Language:PythonLicense:MITStargazers:213Issues:0Issues:0

DG-SCT

NeurIPS'2023 official implementation code

Language:PythonStargazers:52Issues:0Issues:0

Non-local_pytorch

Implementation of Non-local Block.

Language:PythonLicense:Apache-2.0Stargazers:1566Issues:0Issues:0

cross_modal_adaptation

Cross-modal few-shot adaptation with CLIP

Language:PythonLicense:MITStargazers:294Issues:0Issues:0

LM4VisualEncoding

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

Language:PythonLicense:MITStargazers:211Issues:0Issues:0

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Language:PythonLicense:NOASSERTIONStargazers:9904Issues:0Issues:0

TransnormerLLM

Official implementation of TransNormerLLM: A Faster and Better LLM

Language:PythonLicense:Apache-2.0Stargazers:220Issues:0Issues:0

ALBEF

Code for ALBEF: a new vision-language pre-training method

Language:PythonLicense:BSD-3-ClauseStargazers:1458Issues:0Issues:0

up-to-date-Vision-Language-Models

Up-to-date Vision Language Models collection. Mainly focus on computer vision

Stargazers:17Issues:0Issues:0

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:9250Issues:0Issues:0

FNAC_AVL

[CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Language:PythonStargazers:22Issues:0Issues:0

FAVDBench

[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description

Language:PythonLicense:Apache-2.0Stargazers:72Issues:0Issues:0

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:14227Issues:0Issues:0

UniDetector

Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".

Language:PythonLicense:Apache-2.0Stargazers:517Issues:0Issues:0

EditAnything

Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)

Language:PythonLicense:Apache-2.0Stargazers:3234Issues:0Issues:0

Awesome-Masked-Autoencoders

A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).

License:MITStargazers:733Issues:0Issues:0

Transnormer

[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer

Language:PythonStargazers:53Issues:0Issues:0

Tnn

[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling

Language:PythonStargazers:70Issues:0Issues:0

awesome-self-supervised-learning

A curated list of awesome self-supervised methods

Stargazers:6065Issues:0Issues:0

SSL-TIE

Official code for ACMMM2022 paper, "Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation"

Language:PythonLicense:MITStargazers:4Issues:0Issues:0

MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

Language:PythonLicense:NOASSERTIONStargazers:34549Issues:0Issues:0

TemporalPyramidRouting

Temporal Pyramid Routing For Video Instance Segmentation-T-PAMI-2022

Language:PythonLicense:Apache-2.0Stargazers:25Issues:0Issues:0

CaFo

[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Language:PythonLicense:MITStargazers:335Issues:0Issues:0