Mortyzhou-Shef-BIT

yhzhouowo's starred repositories

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookNOASSERTION7265 88 112

ProPainter

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

Language:PythonNOASSERTION5154 49 77

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonApache-2.02502 20 696

FeatUp

Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024

Language:Jupyter NotebookMIT1303 18 56

ISAT_with_segment_anything

Labeling tool with SAM(segment anything model),supports SAM, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具

Language:PythonNOASSERTION1097 9 151

ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]

Language:PythonNOASSERTION831 40 42

ovsam

[arXiv preprint] The official code of paper "Open-Vocabulary SAM".

Language:PythonNOASSERTION725 14 29

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonApache-2.0713 12 71

multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

Language:PythonMIT584 6 75

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

511 30 2

soft-vc

Soft speech units for voice conversion

Language:Jupyter NotebookMIT391 12 14

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Language:Python337 14 43

Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images

Language:PythonMIT317 6 17

BARTScore

BARTScore: Evaluating Generated Text as Text Generation

Language:PythonApache-2.0310 7 44

Awesome-Human-Activity-Recognition

An up-to-date & curated list of Awesome IMU-based Human Activity Recognition(Ubiquitous Computing) papers, methods & resources. Please note that most of the collections of researches are mainly based on IMU data.

MIT227 140

PromptSRC

[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".

Language:PythonMIT203 5 15

awesome-self-supervised-multimodal-learning

[T-PAMI] A curated list of self-supervised multimodal learning resources.

191 5 1

ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Language:PythonApache-2.0148 7 11

MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Language:Python128 1 8

Recent-Image-Quality-Related-Papers

A list of image quality related papers published in top conferences and journals

109 5 1

SPMamba

Language:Python99 3 11

class-incremental-learning

PyTorch implementation of a VAE-based generative classifier, as well as other class-incremental learning methods that do not store data (DGR, BI-R, EWC, SI, CWR, CWR+, AR1, the "labels trick", SLDA).

Language:PythonMIT70 2 5

PEL4VAD

Official code for "Learning Prompt-Enhanced Context features for Weakly-Supervised Video Anomlay Detection"

Language:Jupyter NotebookMIT56 4 18

HammerLLM

1.4B sLLM for Chinese and English - HammerLLM🔨

Language:PythonMIT42 4 1

Multimodal-Learning-with-Alternating-Unimodal-Adaptation

Multimodal Learning Method MLA for CVPR 2024

Language:Python3000

ICLR2024-REDL

[ICLR 2024 Spotlight] R-EDL: Relaxing Nonessential Settings of Evidential Deep Learning

Language:PythonMIT29 1 1

JoMoLD

[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

Language:Python26 2 2

ACES

Audio Captioning Evaluation on Semantics of Sound (ACES)

Language:Jupyter NotebookMIT8 10

r2bench

[ECCV 2024] R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

Language:Python800